The serverless backend for AI apps.

Everything you need to ship — model inference, memory, jobs, data — in one platform, not nine. OpenAI-compatible. Built to run inside the country it's serving.

Python
TypeScript
cURL
# Keep your SDK. Change one line.
from openai import OpenAI

client = OpenAI(base_url="https://api.openai.com/v1")
client = OpenAI(base_url="https://infer.manara.ai/v1")

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "..."}],
)
# ✓ tokens billed in local currency · ~30ms p50 · stays in-country
$50 free credit No card required OpenAI SDK drop-in ~30ms p50 in-region
1.0What it unlocks

Six primitives. One platform.

Everything an AI application needs, on the same backend. Inference is the headline. The other five are what stops you from re-inventing the backend on a Tuesday.

01 · Core · Inference

OpenAI-compatible API. Open-source models. Local GPU.

Managed vLLM endpoints. Drop-in replacement for the OpenAI SDK — chat, embeddings, function calling, streaming, batch. Pay per token, billed in local currency.

Llama 3.3 70B Qwen 2.5 72B DeepSeek V3 Jais 30B Mistral Large Phi-4 + LoRA fine-tunes
~30ms p50 99.95% SLA OpenAI SDK
02 · Memory · Vector

RAG without round trips.

Managed pgvector with hybrid search and reranking. Embeddings co-located with inference — your retrieval, rerank, and model call hit the same region. Single-digit-millisecond internal latency. Pure Postgres underneath, so vectors and business data live in one SQL query.

pgvector + HNSW hybrid (BM25 + dense) cross-encoder rerank
03 · Work · Queues

Durable jobs. Real agents.

Long-running tasks, schedules, retries, dead-letter queues. Typed step functions that resume after failure.

checkpointed steps resumes on failure
04 · Data · Postgres + Storage

Serverless Postgres. S3-compatible storage.

Branching, PITR, read replicas, scale-to-zero. Object storage with in-region durability. The boring parts, done right.

Postgres 16 S3-compatible
05 · Control · Gateway

One key. Many models.

Route across local and frontier models. Per-team spend caps. Semantic caching. Fallback chains. Content policy at the gateway, not in your app.

routing + fallback semantic cache
06 · Trust · Governance

Trace every prompt. Mask every PII field. Audit every call.

The governance layer regulated buyers can't ship without — built into the same platform as your inference and vector. Three capabilities, one boundary, everything stays in-country.

Traces & evals Token-level cost, latency, model attribution. Eval harness with built-in and custom metrics.
PII detection & masking Pre-call masking on the way in to the model. Post-retrieval masking on the way out of vector. 40+ entity types in English and Arabic.
Audit log Tamper-evident, hash-chained log of every privileged action. Exportable for regulators.
2.0How it works

A backend, not a federation of vendors.

Manara is one platform with one control plane, one billing engine, and one boundary across every region we run in. A request from your app touches inference, vector, and Postgres without leaving the same datacenter — let alone the same country.

Your app
OpenAI SDK Postgres client S3 SDK + whatever else you already use
↓ one base_url
infer.manara.ai · db.manara.ai · s3.manara.ai
Manara backend
Inference Vector Queues Postgres Storage Governance
Your region
In-country GPU Local invoice Tamper-evident audit data never crosses a border
3.0Use cases

What people are actually building.

The teams onboarding to Manara today are building production AI features that don't fit a hyperscaler. Each of these is a real shape of customer; none of them is a fit for "just call OpenAI."

01
Customer-support copilot for a regulated bank
Agent answers customer queries with retrieval over the bank's KB. PII can't leave the country; PCI applies. Built on inference + vector + governance, fully in-region.
inference · vector · governance · audit
02
Document extraction at telco scale
Process millions of invoices and contracts per month. Long-running, durable jobs with checkpoints. Multi-step extraction pipeline that resumes after failure.
inference · queues · storage · postgres
03
Multilingual healthcare triage
Arabic + English voice + text triage for a hospital network. Jais and Llama for language, vector for clinical guidelines, PII masking on every call.
inference · vector · PII layer
04
Government-grade RAG over local statutes
A ministry deploys an internal assistant trained on its own regulations and case law. Sovereign control plane. Air-gap option. Every prompt audited.
inference · vector · governance · sovereign tier
05
Founder shipping a B2C AI product on a weekend
Migrated off Vercel + OpenAI + Pinecone + Inngest in an afternoon. Same DX. Half the latency to her users. Local-currency invoice she can hand to her accountant.
inference · vector · queues · postgres
4.0Why Manara

You shouldn't have to choose between speed and sovereignty.

Frontier inference APIs are fast to ship on but slow to defend in front of a compliance review. Hyperscaler regions are defensible but expensive, slow to set up, and missing the AI layer entirely. DIY on Kubernetes is technically possible and a year of your life. Here's how Manara compares.

Manara
Frontier APIs
Hyperscaler
DIY on K8s
OpenAI-compatible API
you build it
Inference, vector, queues, DB in one platform
multi-vendor
you build it
Runs inside your country
limited regions
Billed in local currency
USD by default
you build it
PII masking + governance built in
you build it
Audit log compliance team can hand to a regulator
partial
you build it
Time to first deploy
minutes
minutes
weeks
months
5.0Enterprise & partners

Built for who buys regional infrastructure.

Two audiences buy at the top end: regulated enterprises who need the platform delivered into their compliance perimeter, and operators who want to run it under their own brand. We've designed for both.

Enterprise

Sovereign control plane in your perimeter.

For banks, telcos, healthcare, and government. Manara deployed inside your network with optional air-gap mode. Source escrow. SOC2 + PDPL ready. Your auditor signs once; your engineers stop praying.

Talk to enterprise →
DeploymentIn your DC / VPC
CompliancePDPL · NCA · TRA
Source escrowAvailable
Local supportYes
Partners

Your GPUs. Your fiber. Our platform.

For carriers, datacenter operators, and national clouds. Federate with the network, white-label the experience, monetize the infrastructure you already own. Three program tiers, partnership applications open.

See the partner program →
Tier 1Resell · 30–90 days
Tier 2Co-Cloud · 90–180 days
Tier 3Sovereign · scoped
Customer ownership100% yours
6.0Pricing

Pay per token. Pay per second. No minimums.

Free tier on every account. Per-token inference, per-second compute, per-GB storage. Billed in your local currency. No "contact sales for pricing" theatre.

Free

Get started

For prototypes, internal tools, and the first weekend of a real product.

$50credit · no card
Scale

Pay as you go

Per-token inference. Per-second compute and Postgres. Per-GB vector and storage.

$0.60/ 1M tokens · Llama 70B
Enterprise

Custom

Committed-use discounts, sovereign control plane, dedicated support, custom SLAs.

Talk to us
7.0FAQ

Frequently asked.

Why can't I just use OpenAI directly?
You can — and a lot of teams do, until their compliance officer reads the OpenAI terms or the regulator emails them. For anything touching PII, financial records, healthcare, or government data in most MEA jurisdictions, sending payloads to a US-region endpoint is a problem you'll eventually have to solve. Manara is what you migrate to when "we'll deal with it later" stops being acceptable. Change one base_url and the migration is done.
Is the inference quality actually comparable?
For most production workloads, yes. Llama 3.3 70B, Qwen 2.5 72B, and DeepSeek V3 land within striking distance of GPT-4o and Claude 3.5 on most benchmarks — and they outperform on certain Arabic-language tasks, especially with Jais. Frontier models still lead on the hardest reasoning tasks. If you need those, our AI Gateway routes to them too, with the same key. You don't have to choose.
What about cost — am I paying a regional premium?
No. Per-token rates are competitive with Together and Fireworks, sometimes cheaper, often a fraction of OpenAI for comparable-class models. The regional infrastructure isn't a tax — it's the point. You also save the egress fees and FX losses that come with cross-border inference.
Do I have to use all six primitives?
No. Most teams start with just inference (one base_url change). They add vector when they build RAG, queues when they ship agents, governance when their compliance officer asks. Use what you need; the rest costs nothing until you turn it on.
Is my data really staying in-country?
Yes. Inference happens on GPUs physically located in your selected region. Vector indexes, Postgres, and object storage are all in-region. Traces, audit logs, and PII detection run in-region. Cross-region replication is opt-in only and policy-controlled. There is no telemetry, no "model training on your data," and no third-party processors in the data path.
What if my country isn't in your region list yet?
Two options. You can use the nearest region today (Riyadh, Dubai, Karachi, Cairo, or Lagos), with full transparency about where your data lives. Or — if you have meaningful workload — talk to us about bringing a region to your country through our partner program. We've designed for exactly this case.
How is this different from running OSS models on AWS Bedrock?
Bedrock is single-vendor inference. It doesn't ship a vector layer that's co-located with your model. It doesn't include the queues, governance, or AI gateway. Cross-region billing is in USD, and most MEA regions don't have GPU inventory yet. Bedrock is a fine product; it's just not the same product.
What do you actually want out of this?
To rebuild the cloud as a federation of sovereign regions, starting where the demand is most acute and the alternatives are weakest. We need engineers shipping on us, we need operators federating with us, we need regulators trusting us. We're transparent about that. The product is the wedge; the federation is the company.

Change one base_url. Ship the rest.

$50 free credit. No card. Llama, Qwen, Jais, DeepSeek — running where your users are, billed in your currency, governed by their laws.