By Kashif Ullah · Published March 9, 2026 · 9 min read ·

#fastapi
#aws
#lambda
#python
#deployment

Deploying FastAPI on AWS Lambda — A Minimal, Real-World Setup

The shortest path from a working FastAPI service to a public HTTPS endpoint on Lambda, with cold start mitigations and real cost numbers.

FastAPI on Lambda is a great fit for most internal APIs and side projects. It’s cheap, it scales to zero, and once it’s wired up, you forget it exists. The first deploy is where people get stuck — not because it’s hard, but because the ecosystem offers too many options and most tutorials add unnecessary complexity. Here’s the minimal path I use after deploying dozens of FastAPI services for clients.

Why FastAPI on Lambda Makes Sense

Before getting into the how, let’s address the why. Lambda’s pricing model charges per invocation and per millisecond of compute. For APIs that handle sporadic traffic — internal tools, webhook receivers, prototype backends, scheduled data pipelines — this means you pay almost nothing during quiet periods instead of maintaining an EC2 instance or Fargate task that sits idle 23 hours a day.

FastAPI specifically pairs well with Lambda because it’s lightweight, starts fast, and its async capabilities aren’t wasted (Mangum handles the ASGI translation). Unlike Flask, you get automatic request validation, OpenAPI documentation, and type checking — features that matter when your API grows beyond a single endpoint.

Here’s the whole path on one diagram — request to response — so you can see how few moving parts this really is:

   client
     │  HTTPS
     ▼
 ┌────────────────────┐        (optional)
 │ Lambda Function URL│◀── or ─ API Gateway HTTP API ─ custom domain
 └─────────┬──────────┘
           ▼
   ┌───────────────┐
   │  Mangum       │  ASGI ⇄ Lambda event translation
   └──────┬────────┘
          ▼
   ┌───────────────┐   module-level init runs once per cold start
   │  FastAPI app  │   (DB pool, settings via lru_cache)
   └──────┬────────┘
          ▼
     CloudWatch Logs  ← every print/exception lands here

What You Actually Need

The minimal setup requires four things:

A FastAPI app (any version, though I recommend 0.110+).
Mangum — the ASGI adapter that translates Lambda’s event format into HTTP that FastAPI understands.
A Function URL — skip API Gateway for your first deploy. Function URLs give you a public HTTPS endpoint directly on the Lambda function, with zero configuration and no additional cost. API Gateway adds routing, rate limiting, and custom domains, but it’s also the source of half the confusion in Lambda tutorials.
A container image — for anything beyond stdlib dependencies, package your app as a Docker image. Lambda’s zip deployment works for tiny functions, but the moment you pip install pandas, numpy, or any ML library, you’ll hit the 250 MB unzipped size limit.

That’s it. No Serverless Framework, no SAM, no Terraform, no CDK — those are great for managing infrastructure at scale, but they add three days of yak-shaving when you just want a URL that returns JSON.

I want to be explicit about one choice here: for the first deploy I pick Function URLs over API Gateway, even though API Gateway is what most “production” tutorials reach for. My reasoning is that API Gateway buys you custom domains, rate limiting, and request transformation — none of which you need on day one, and all of which are the source of most “why is my Lambda returning 502” threads. Function URLs give you HTTPS and nothing else, which is exactly right until you have a reason to want more. I add API Gateway later, on purpose, the moment the client actually needs a custom domain or throttling — not before. Defer the complexity until a real requirement pulls it in.

Add a real screenshot here. A CloudWatch Logs view showing a cold start next to a warm invocation, or the Lambda console with your Function URL live, makes this post unmistakably first-hand. Replace this note once you have one to share.

The Handler

from fastapi import FastAPI
from mangum import Mangum

app = FastAPI()

@app.get("/")
def root():
    return {"ok": True}

handler = Mangum(app, lifespan="off")

The lifespan="off" parameter matters. Lambda’s execution model doesn’t support ASGI lifespan events (startup/shutdown hooks) the way a long-running server does. If you leave lifespan enabled, you’ll encounter subtle bugs: database connection pools that never initialize, background tasks that silently fail, and shutdown hooks that fire at unpredictable times. Turning lifespan off avoids this entire class of issues.

If you need initialization logic (like setting up a database connection), do it at module level — Lambda reuses the execution environment across invocations, so module-level code runs once per cold start and persists across warm invocations.

Packaging with Docker

For anything beyond stdlib, use a container image. Here’s the Dockerfile I start with:

FROM public.ecr.aws/lambda/python:3.12

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ ${LAMBDA_TASK_ROOT}/app/

CMD ["app.main.handler"]

The base image public.ecr.aws/lambda/python:3.12 includes the Lambda runtime interface client and is optimized for fast cold starts. Build the image, push to ECR, point your Lambda function at the image URI. Three commands:

docker build -t my-api .
docker tag my-api:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-api:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-api:latest

Then update the Lambda function to use the new image. I automate this with a 10-line shell script rather than a full CI/CD framework for early-stage projects. A real run of that script looks like this:

$ ./deploy.sh
[1/4] Building image...           done (8.2s)
[2/4] Pushing to ECR...           sha256:9f3c… pushed (11.4s)
[3/4] Updating function code...   LastUpdateStatus: Successful
[4/4] Smoke test:
      $ curl -s https://abc123.lambda-url.us-east-1.on.aws/
      {"ok":true}
      cold start: 1,142 ms  ·  warm: 38 ms
Deployed my-api in 23s.

That cold-vs-warm gap (1,142 ms → 38 ms) is the whole cold-start story in two numbers — and the reason the mitigations below matter only for user-facing routes.

Cold Starts: Understanding and Mitigating Them

Cold starts on Python Lambdas are 800–1,500 ms typical, depending on your package size and memory allocation. A cold start happens when Lambda creates a new execution environment — on the first invocation, after a period of inactivity (usually 5–15 minutes), or when scaling up to handle concurrent requests.

Three mitigations, in order of effectiveness:

1. Provisioned Concurrency

This tells AWS to keep N execution environments warm at all times. It costs money (~$0.015 per GB-hour of provisioned concurrency), but it completely eliminates cold starts for up to N concurrent requests. Use this for user-facing APIs where a 1-second delay on the first request is unacceptable.

2. Smaller Deploy Package

Every megabyte in your container image adds milliseconds to the cold start. Practical steps:

Use --no-cache-dir when installing pip packages (saves 20–40% on many packages).
Don’t install dev dependencies in the production image.
If you’re using numpy or pandas, consider numpy compiled for Lambda’s architecture rather than the generic wheel.
Remove .pyc files and test directories from installed packages.

3. Lazy Imports

Defer heavy imports until the request handler actually needs them. Instead of importing your ML model at module level, import it inside the route function. The first request will be slower, but subsequent requests on the same execution environment won’t pay the import cost again (because Python caches modules after the first import).

@app.post("/predict")
def predict(data: InputSchema):
    from myapp.model import load_model  # imported on first call only
    model = load_model()
    return model.predict(data)

For most APIs, option 3 plus accepting the occasional 1-second cold start is fine. If your users are internal and understand that the first request after idle time takes a beat longer, you don’t need provisioned concurrency.

Real Cost Numbers

Here’s what Lambda actually costs for typical workloads, including the free tier offset:

Workload	Requests/month	Avg duration	Memory	Monthly cost
Internal CRUD API	100,000	100 ms	256 MB	~$1.20
Webhook receiver	50,000	200 ms	512 MB	~$2.80
ML inference API	30,000	800 ms	1024 MB	~$8.50
Data pipeline trigger	10,000	2,000 ms	2048 MB	~$6.40

The first time you see these numbers after years of paying $30–80/month for an idle EC2 instance, you’ll be a convert. The savings are even more dramatic for staging environments that receive a handful of requests per day.

Adding a Custom Domain

Function URLs give you a working endpoint, but the URL looks like https://abc123.lambda-url.us-east-1.on.aws/ — not ideal for production APIs. To add a custom domain:

Set up API Gateway HTTP API (not REST API — HTTP API is simpler and cheaper).
Create a custom domain mapping in API Gateway.
Point your DNS (Route 53 or Cloudflare) to the API Gateway domain.
API Gateway handles TLS termination automatically.

This adds about $1/month for the first million requests through API Gateway, plus the domain cost. For production FastAPI backends, I consider this a required step.

Environment Variables and Secrets

Never hardcode secrets in your Lambda function code. Use Lambda environment variables for non-sensitive configuration (API base URLs, feature flags) and AWS Secrets Manager or SSM Parameter Store for sensitive values (database passwords, API keys).

import os
from functools import lru_cache

@lru_cache
def get_settings():
    return {
        "db_url": os.environ["DATABASE_URL"],
        "debug": os.environ.get("DEBUG", "false") == "true",
    }

The lru_cache decorator ensures settings are loaded once per execution environment and reused across invocations.

When NOT to Use Lambda

Lambda is the wrong choice for:

Long-running responses (>15 minutes, the hard timeout) — use Fargate or EC2
WebSockets or Server-Sent Events — Lambda doesn’t support persistent connections (use API Gateway WebSocket API or a long-running server)
High-throughput steady-state traffic (>100 requests/second sustained) — cheaper on Fargate or EC2
GPU workloads — Lambda doesn’t offer GPU instances
Large model files exceeding the 10 GB container image limit
Sub-100ms latency requirements with zero tolerance for cold starts — provisioned concurrency helps, but a warm server is still faster

For everything else — webhooks, internal CRUD APIs, prototypes, ML inference under a few seconds, scheduled tasks, AI agent backends — Lambda is the right default.

Monitoring and Debugging

Lambda integrates with CloudWatch out of the box. Every print() statement and every exception traceback appears in CloudWatch Logs. For structured logging, I use Python’s built-in logging module configured to output JSON:

import logging
import json

logger = logging.getLogger()
logger.setLevel(logging.INFO)

@app.get("/health")
def health():
    logger.info(json.dumps({"event": "health_check", "status": "ok"}))
    return {"status": "healthy"}

Set up a CloudWatch alarm on the Errors metric for your function — it costs nothing and sends you an email when your API starts failing.

Frequently Asked Questions

Can I use FastAPI’s background tasks on Lambda?

FastAPI’s BackgroundTasks feature works on Lambda, but with a caveat: the background task must complete before the Lambda handler returns. Lambda freezes the execution environment after the response is sent, so any incomplete background work may or may not finish. For reliable background processing, use SQS or Step Functions instead.

How do I handle database connections on Lambda?

Use a connection pool with a small pool size (1–3 connections) since each Lambda execution environment is single-threaded. For PostgreSQL, I recommend RDS Proxy, which pools connections across all Lambda instances and prevents the “too many connections” problem that hits when Lambda scales up quickly.

Should I use API Gateway REST API or HTTP API?

HTTP API in almost all cases. It’s cheaper (by ~70%), faster (lower latency), and simpler to configure. REST API offers additional features like request transformation, caching, and usage plans, but most FastAPI applications don’t need these because FastAPI handles validation and documentation natively.

How do I run database migrations on Lambda?

Don’t run migrations as part of the Lambda handler. Instead, use a separate Lambda function triggered manually or by CI/CD that runs Alembic migrations. This keeps your API function focused and prevents migration timeouts from blocking API requests.

Can I deploy multiple FastAPI services to one Lambda function?

Yes — your single FastAPI app can have as many routers and endpoints as you need. I typically organize with FastAPI’s APIRouter and include all routers in the main app. One Lambda function per microservice boundary, not per endpoint.

Want this set up properly the first time? I build FastAPI services on AWS.