TL;DR
Deploying a LangGraph agent to AWS Lambda is a fit when traffic is bursty, you want pay-per-call billing, and your graph runs in under 15 minutes. Steps:
- Package the agent code + dependencies as a container image (faster cold starts than zip for graphs > 50 MB).
- Set Lambda memory to 1024–3008 MB — LangGraph’s CPU is memory-coupled, more memory = faster reasoning.
- Set timeout to your graph’s longest expected run (max 15 min).
- Pull secrets from AWS Secrets Manager / Parameter Store at cold start, cache in module scope.
- Use Provisioned Concurrency on the entry handler if 1-second cold starts are unacceptable.
When Lambda is the right fit
Lambda fits LangGraph agents when:
- Traffic is bursty (chat, webhook, API gateway-driven)
- You need to scale from 0 to 1000 concurrent without infrastructure work
- Your graph’s worst-case path completes in under 15 minutes
- You don’t need persistent in-memory state across invocations
Lambda is not the right fit when:
- The graph runs for > 15 minutes (use ECS Fargate or Step Functions instead)
- You need WebSocket-style streaming back to the client (use API Gateway WebSockets or App Runner)
- The graph maintains a long-lived connection to a vector DB you can’t quickly re-establish
Packaging
Use a container image, not a zip. LangGraph plus LangChain plus your model SDK plus a vector store client easily exceeds the 250 MB unzipped Lambda zip limit. Container images go up to 10 GB.
A minimal Dockerfile:
FROM public.ecr.aws/lambda/python:3.12
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app/ ${LAMBDA_TASK_ROOT}/app/
CMD ["app.handler.lambda_handler"]
Cold start mitigation
Cold starts on a 2 GB-memory Lambda with LangGraph + LangChain are typically 1.5–3 seconds. To mitigate:
- Provisioned Concurrency — pre-warm a fixed number of instances (costs ~$0.000004 per GB-second to provision; quickly pays for itself if you serve more than ~50 invocations/hour)
- Module-scope imports — keep heavy imports at module level so they’re amortized across invocations within the same warm container
- Lazy graph construction — build the LangGraph state machine once at module load, not per invocation
Secrets
Don’t bake API keys into the image. Use Secrets Manager or Parameter Store:
import boto3, json
ssm = boto3.client("ssm")
_secrets = None
def get_secrets():
global _secrets
if _secrets is None:
_secrets = json.loads(
ssm.get_parameter(Name="/myagent/secrets", WithDecryption=True)["Parameter"]["Value"]
)
return _secrets
Cache at module scope (top-level _secrets = None) so the SSM call only happens on cold start.
IAM: scope tool permissions tight
Give the Lambda an IAM role that only allows the AWS APIs the agent’s tools need. If your agent reads from S3, s3:GetObject on one prefix only — never s3:*. The agent will eventually try to do something it shouldn’t; least-privilege is your safety net.
Cost math
For an agent invocation taking 4 seconds at 2 GB:
- Lambda compute: 4 × 2 = 8 GB-seconds × $0.0000167 = $0.000133 per call
- API Gateway: $0.0000035 per call
- Token cost (4K input + 2K output tokens, GPT-4 class): ~$0.10
- Total: $0.10–$0.15 per agent invocation
Token cost dominates. Lambda overhead is rounding error.