How to Deploy a LangGraph Agent to AWS Lambda

TL;DR

Deploying a LangGraph agent to AWS Lambda is a fit when traffic is bursty, you want pay-per-call billing, and your graph runs in under 15 minutes. Steps:

Package the agent code + dependencies as a container image (faster cold starts than zip for graphs > 50 MB).
Set Lambda memory to 1024–3008 MB — LangGraph’s CPU is memory-coupled, more memory = faster reasoning.
Set timeout to your graph’s longest expected run (max 15 min).
Pull secrets from AWS Secrets Manager / Parameter Store at cold start, cache in module scope.
Use Provisioned Concurrency on the entry handler if 1-second cold starts are unacceptable.

When Lambda is the right fit

Lambda fits LangGraph agents when:

Traffic is bursty (chat, webhook, API gateway-driven)
You need to scale from 0 to 1000 concurrent without infrastructure work
Your graph’s worst-case path completes in under 15 minutes
You don’t need persistent in-memory state across invocations

Lambda is not the right fit when:

The graph runs for > 15 minutes (use ECS Fargate or Step Functions instead)
You need WebSocket-style streaming back to the client (use API Gateway WebSockets or App Runner)
The graph maintains a long-lived connection to a vector DB you can’t quickly re-establish

Packaging

Use a container image, not a zip. LangGraph plus LangChain plus your model SDK plus a vector store client easily exceeds the 250 MB unzipped Lambda zip limit. Container images go up to 10 GB.

A minimal Dockerfile:

FROM public.ecr.aws/lambda/python:3.12
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app/ ${LAMBDA_TASK_ROOT}/app/
CMD ["app.handler.lambda_handler"]

Cold start mitigation

Cold starts on a 2 GB-memory Lambda with LangGraph + LangChain are typically 1.5–3 seconds. To mitigate:

Provisioned Concurrency — pre-warm a fixed number of instances (costs ~$0.000004 per GB-second to provision; quickly pays for itself if you serve more than ~50 invocations/hour)
Module-scope imports — keep heavy imports at module level so they’re amortized across invocations within the same warm container
Lazy graph construction — build the LangGraph state machine once at module load, not per invocation

Secrets

Don’t bake API keys into the image. Use Secrets Manager or Parameter Store:

import boto3, json
ssm = boto3.client("ssm")
_secrets = None
def get_secrets():
    global _secrets
    if _secrets is None:
        _secrets = json.loads(
            ssm.get_parameter(Name="/myagent/secrets", WithDecryption=True)["Parameter"]["Value"]
        )
    return _secrets

Cache at module scope (top-level _secrets = None) so the SSM call only happens on cold start.

IAM: scope tool permissions tight

Give the Lambda an IAM role that only allows the AWS APIs the agent’s tools need. If your agent reads from S3, s3:GetObject on one prefix only — never s3:*. The agent will eventually try to do something it shouldn’t; least-privilege is your safety net.

Cost math

For an agent invocation taking 4 seconds at 2 GB:

Lambda compute: 4 × 2 = 8 GB-seconds × $0.0000167 = $0.000133 per call
API Gateway: $0.0000035 per call
Token cost (4K input + 2K output tokens, GPT-4 class): ~$0.10
Total: $0.10–$0.15 per agent invocation

Token cost dominates. Lambda overhead is rounding error.

April 24, 2026 Musketeers Tech Musketeers Tech

← Back