The serverless backend for AI apps

1.0What it unlocks

Six primitives. One platform.

Everything an AI application needs, on the same backend. Inference is the headline. The other five are what stops you from re-inventing the backend on a Tuesday.

01 · Core · Inference

OpenAI-compatible API. Open-source models. Local GPU.

Managed vLLM endpoints. Drop-in replacement for the OpenAI SDK — chat, embeddings, function calling, streaming, batch. Pay per token, billed in local currency.

Llama 3.3 70B Qwen 2.5 72B DeepSeek V3 Jais 30B Mistral Large Phi-4 + LoRA fine-tunes

~30ms p50 99.95% SLA OpenAI SDK

02 · Memory · Vector

RAG without round trips.

Managed pgvector with hybrid search and reranking. Embeddings co-located with inference — your retrieval, rerank, and model call hit the same region. Single-digit-millisecond internal latency. Pure Postgres underneath, so vectors and business data live in one SQL query.

pgvector + HNSW hybrid (BM25 + dense) cross-encoder rerank

03 · Work · Queues

Durable jobs. Real agents.

Long-running tasks, schedules, retries, dead-letter queues. Typed step functions that resume after failure.

checkpointed steps resumes on failure

04 · Data · Postgres + Storage

Serverless Postgres. S3-compatible storage.

Branching, PITR, read replicas, scale-to-zero. Object storage with in-region durability. The boring parts, done right.

Postgres 16 S3-compatible

05 · Control · Gateway

One key. Many models.

Route across local and frontier models. Per-team spend caps. Semantic caching. Fallback chains. Content policy at the gateway, not in your app.

routing + fallback semantic cache

06 · Trust · Governance

Trace every prompt. Mask every PII field. Audit every call.

The governance layer regulated buyers can't ship without — built into the same platform as your inference and vector. Three capabilities, one boundary, everything stays in-country.

Traces & evals Token-level cost, latency, model attribution. Eval harness with built-in and custom metrics.

PII detection & masking Pre-call masking on the way in to the model. Post-retrieval masking on the way out of vector. 40+ entity types in English and Arabic.

Audit log Tamper-evident, hash-chained log of every privileged action. Exportable for regulators.

2.0How it works

A backend, not a federation of vendors.

Manara is one platform with one control plane, one billing engine, and one boundary across every region we run in. A request from your app touches inference, vector, and Postgres without leaving the same datacenter — let alone the same country.

Your app

OpenAI SDK Postgres client S3 SDK + whatever else you already use

↓ one base_url

infer.manara.ai · db.manara.ai · s3.manara.ai

Manara backend

Inference Vector Queues Postgres Storage Governance

Your region

In-country GPU Local invoice Tamper-evident audit data never crosses a border

3.0Use cases

What people are actually building.

The teams onboarding to Manara today are building production AI features that don't fit a hyperscaler. Each of these is a real shape of customer; none of them is a fit for "just call OpenAI."

Customer-support copilot for a regulated bank

Agent answers customer queries with retrieval over the bank's KB. PII can't leave the country; PCI applies. Built on inference + vector + governance, fully in-region.

inference · vector · governance · audit

Document extraction at telco scale

Process millions of invoices and contracts per month. Long-running, durable jobs with checkpoints. Multi-step extraction pipeline that resumes after failure.

inference · queues · storage · postgres

Multilingual healthcare triage

Arabic + English voice + text triage for a hospital network. Jais and Llama for language, vector for clinical guidelines, PII masking on every call.

inference · vector · PII layer

Government-grade RAG over local statutes

A ministry deploys an internal assistant trained on its own regulations and case law. Sovereign control plane. Air-gap option. Every prompt audited.

inference · vector · governance · sovereign tier

Founder shipping a B2C AI product on a weekend

Migrated off Vercel + OpenAI + Pinecone + Inngest in an afternoon. Same DX. Half the latency to her users. Local-currency invoice she can hand to her accountant.

inference · vector · queues · postgres

4.0Why Manara

You shouldn't have to choose between speed and sovereignty.

Frontier inference APIs are fast to ship on but slow to defend in front of a compliance review. Hyperscaler regions are defensible but expensive, slow to set up, and missing the AI layer entirely. DIY on Kubernetes is technically possible and a year of your life. Here's how Manara compares.

Manara

Frontier APIs

Hyperscaler

DIY on K8s

OpenAI-compatible API

you build it

Inference, vector, queues, DB in one platform

multi-vendor

you build it

Runs inside your country

limited regions

Billed in local currency

USD by default

you build it

PII masking + governance built in

you build it

Audit log compliance team can hand to a regulator

partial

you build it

Time to first deploy

minutes

weeks

months

5.0Enterprise & partners

Built for who buys regional infrastructure.

Two audiences buy at the top end: regulated enterprises who need the platform delivered into their compliance perimeter, and operators who want to run it under their own brand. We've designed for both.

Enterprise

Sovereign control plane in your perimeter.

For banks, telcos, healthcare, and government. Manara deployed inside your network with optional air-gap mode. Source escrow. SOC2 + PDPL ready. Your auditor signs once; your engineers stop praying.

Talk to enterprise →

DeploymentIn your DC / VPC

CompliancePDPL · NCA · TRA

Source escrowAvailable

Local supportYes

Partners

Your GPUs. Your fiber. Our platform.

For carriers, datacenter operators, and national clouds. Federate with the network, white-label the experience, monetize the infrastructure you already own. Three program tiers, partnership applications open.

See the partner program →

Tier 1Resell · 30–90 days

Tier 2Co-Cloud · 90–180 days

Tier 3Sovereign · scoped

Customer ownership100% yours

6.0Pricing

Pay per token. Pay per second. No minimums.

Free tier on every account. Per-token inference, per-second compute, per-GB storage. Billed in your local currency. No "contact sales for pricing" theatre.

Free

Get started

For prototypes, internal tools, and the first weekend of a real product.

$50credit · no card

Scale

Pay as you go

Per-token inference. Per-second compute and Postgres. Per-GB vector and storage.

$0.60/ 1M tokens · Llama 70B

Enterprise

Custom

Committed-use discounts, sovereign control plane, dedicated support, custom SLAs.

Talk to us

7.0FAQ

Frequently asked.

Why can't I just use OpenAI directly?

You can — and a lot of teams do, until their compliance officer reads the OpenAI terms or the regulator emails them. For anything touching PII, financial records, healthcare, or government data in most MEA jurisdictions, sending payloads to a US-region endpoint is a problem you'll eventually have to solve. Manara is what you migrate to when "we'll deal with it later" stops being acceptable. Change one base_url and the migration is done.

Is the inference quality actually comparable?

For most production workloads, yes. Llama 3.3 70B, Qwen 2.5 72B, and DeepSeek V3 land within striking distance of GPT-4o and Claude 3.5 on most benchmarks — and they outperform on certain Arabic-language tasks, especially with Jais. Frontier models still lead on the hardest reasoning tasks. If you need those, our AI Gateway routes to them too, with the same key. You don't have to choose.

What about cost — am I paying a regional premium?

No. Per-token rates are competitive with Together and Fireworks, sometimes cheaper, often a fraction of OpenAI for comparable-class models. The regional infrastructure isn't a tax — it's the point. You also save the egress fees and FX losses that come with cross-border inference.

Do I have to use all six primitives?

No. Most teams start with just inference (one base_url change). They add vector when they build RAG, queues when they ship agents, governance when their compliance officer asks. Use what you need; the rest costs nothing until you turn it on.

Is my data really staying in-country?

Yes. Inference happens on GPUs physically located in your selected region. Vector indexes, Postgres, and object storage are all in-region. Traces, audit logs, and PII detection run in-region. Cross-region replication is opt-in only and policy-controlled. There is no telemetry, no "model training on your data," and no third-party processors in the data path.

What if my country isn't in your region list yet?

Two options. You can use the nearest region today (Riyadh, Dubai, Karachi, Cairo, or Lagos), with full transparency about where your data lives. Or — if you have meaningful workload — talk to us about bringing a region to your country through our partner program. We've designed for exactly this case.

How is this different from running OSS models on AWS Bedrock?

Bedrock is single-vendor inference. It doesn't ship a vector layer that's co-located with your model. It doesn't include the queues, governance, or AI gateway. Cross-region billing is in USD, and most MEA regions don't have GPU inventory yet. Bedrock is a fine product; it's just not the same product.

What do you actually want out of this?

To rebuild the cloud as a federation of sovereign regions, starting where the demand is most acute and the alternatives are weakest. We need engineers shipping on us, we need operators federating with us, we need regulators trusting us. We're transparent about that. The product is the wedge; the federation is the company.

The serverless backend for AI apps.

Six primitives. One platform.

OpenAI-compatible API. Open-source models. Local GPU.

RAG without round trips.

Durable jobs. Real agents.

Serverless Postgres. S3-compatible storage.

One key. Many models.

Trace every prompt. Mask every PII field. Audit every call.

A backend, not a federation of vendors.

What people are actually building.

You shouldn't have to choose between speed and sovereignty.

Built for who buys regional infrastructure.

Sovereign control plane in your perimeter.

Your GPUs. Your fiber. Our platform.

Pay per token. Pay per second. No minimums.

Get started

Pay as you go

Custom

Frequently asked.

Change one base_url. Ship the rest.