Beyond Hello World: Building Production-Ready AWS Lambda Functions
A "Hello World" Lambda is about five lines of code. A Lambda you can trust in production is a different animal: it has to survive retries without doing the same work twice, scale without falling over, fail loudly enough that you actually notice, and cost roughly what you expected.
This is the checklist I wish I'd had the first time I shipped one. Everything below is grounded in the official AWS and HashiCorp documentation — I've linked the sources inline so you can verify each point rather than take my word for it. Examples are in Python and Terraform.
Idempotency, retries, timeouts, and dead-letter queues
Start here, because it's the part people skip and regret.
Lambda's delivery is at-least-once, not exactly-once. Event source mappings (SQS, Kinesis, DynamoDB Streams) process each event at least once, and asynchronous invocations retry on failure (AWS Lambda best practices). Duplicates aren't an edge case — they're guaranteed eventually. If your handler charges a card, writes a row, or sends an email, it has to be idempotent.
The cleanest way I've found to do this in Python is the Powertools idempotency utility. It hashes the event (or a chosen subset of fields) into an idempotency key, stores it in DynamoDB, and returns the cached result if the same request arrives again within an expiry window (default one hour):
from aws_lambda_powertools.utilities.idempotency import (
DynamoDBPersistenceLayer, IdempotencyConfig, idempotent,
)
import os
persistence = DynamoDBPersistenceLayer(table_name=os.environ["IDEMPOTENCY_TABLE"])
config = IdempotencyConfig(
event_key_jmespath='["user_id", "product_id"]', # what makes two requests "the same"
expires_after_seconds=3600,
use_local_cache=True,
)
@idempotent(config=config, persistence_store=persistence)
def lambda_handler(event, context):
payment_id = create_payment(event["user_id"], event["product_id"], event["amount"])
return {"payment_id": payment_id, "status": "success"}
The first call runs and caches the result; a retry with the same key returns it without re-executing the body. The DynamoDB table needs a partition key id and a TTL attribute, and one table can back every idempotent function you own (Powertools idempotency).
Two things that bit me here:
- SQS visibility timeout must exceed your function's maximum duration, or SQS will redeliver the message while you're still processing it — duplicates, guaranteed (best practices).
- DynamoDB TTL deletion can lag by up to 48 hours, so Powertools validates expiry itself instead of trusting TTL to hand you a fresh slate.
Retries and timeouts. For asynchronous invocations, Lambda retries a failed event a maximum of two times — a hard limit you can lower but not raise — and discards events older than the maximum age (default 3,600 seconds, configurable up to 24 hours) (invoking asynchronously). Set your timeout deliberately: the default of 3 seconds is almost always too low for real work. Size it from load testing against your slowest downstream call, and remember it also has to cover cold starts.
Dead-letter queues vs. on-failure destinations. When retries are exhausted, the failed event needs to go somewhere you can inspect:
- A dead-letter queue (SQS standard or SNS standard — FIFO is not supported) receives the event as-is plus three attributes:
RequestID,ErrorCode, andErrorMessage(capturing records). - On-failure destinations (SQS, SNS, S3, Lambda, or EventBridge) receive a richer JSON record with the request context, the response, and
approximateInvokeCount— much better for debugging (capturing records).
For queue and stream sources the failure model is different: an error reprocesses the entire batch, not a single record. Turn on partial batch response so only the failed records are retried:
def handler(event, context):
failures = []
for record in event["Records"]:
try:
process(record)
except Exception:
failures.append({"itemIdentifier": record["messageId"]})
return {"batchItemFailures": failures}
Cold starts and dependency size
A cold start is the INIT phase: Lambda spins up an execution environment, downloads your code, starts the runtime, and runs everything outside your handler — once per environment, not per request (execution environment lifecycle). It happens on the first request, when traffic scales past your warm environments, or after roughly 15 minutes idle, and it typically adds anywhere from under 100ms to over a second.
Two levers matter most.
1. Keep the package small. The limits are 50 MB zipped and 250 MB unzipped including layers (Lambda quotas), but you want to be well under them — Lambda downloads and decompresses your code on every cold start, and even a 50 MB zip can add hundreds of milliseconds. Strip test and build artifacts, prune unused dependencies, and reach for a layer only when something heavy is genuinely shared.
2. Initialize once, lazy-load the rest. Create SDK clients and connections at module scope so warm invocations reuse them, and defer expensive, rarely-used imports until a code path actually needs them:
import json
import boto3
# INIT phase — runs once per environment, reused on warm invocations
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("orders")
_heavy = None # loaded lazily, only when needed
def lambda_handler(event, context):
if event.get("needs_heavy"):
global _heavy
if _heavy is None:
import expensive_library as _heavy
# ... use _heavy
item = table.get_item(Key={"id": event["id"]}).get("Item")
return {"statusCode": 200, "body": json.dumps(item)}
If cold starts are genuinely hurting a latency-sensitive path, the official options are Provisioned Concurrency (pre-initialized environments, double-digit-millisecond starts, extra cost — more on this next) or SnapStart, which restores a snapshot of the initialized environment and is available for Java, Python, and .NET (SnapStart). One catch with SnapStart: if your init code generates anything unique — a UUID, a random seed, a cached secret — it gets reused across restored environments, so regenerate it after restore.
And a VPC gotcha worth knowing: attaching a function to a VPC adds ENI-attachment overhead to cold starts, so only do it when you actually need private-network access.
Reserved concurrency and scaling
Lambda scales by running more execution environments, but there are limits worth knowing before your traffic finds them for you (scaling and concurrency):
- The default account limit is 1,000 concurrent executions per Region (raise it through Service Quotas).
- A single function scales at up to 1,000 new environments every 10 seconds.
- There's also a requests-per-second ceiling of 10× your concurrency — a very fast function can hit the RPS wall before the concurrency wall.
Concurrency isn't requests per second; it's average RPS × average duration in seconds. 100 requests/sec at 100ms each is only about 10 concurrent.
You have two controls:
- Reserved concurrency caps a function's maximum concurrency (and guarantees it that slice). It costs nothing, isolates the function from noisy neighbors, and is perfect for protecting a fragile downstream dependency — but it does not remove cold starts.
- Provisioned concurrency pre-warms a set number of environments so requests skip the cold start entirely. It costs money (even while idle), and it must be attached to a version or alias, not
$LATEST(provisioned concurrency).
When concurrency is exhausted, Lambda throttles with a 429. A common production setup pairs reserved concurrency as a guardrail with provisioned concurrency (often driven by Application Auto Scaling) for the predictable baseline.
aws lambda put-function-concurrency \
--function-name orders \
--reserved-concurrent-executions 100
Structured logging, metrics, tracing, and alarms
You can't operate what you can't see.
Structured logs. Log JSON, not strings. CloudWatch Logs Insights auto-discovers fields from JSON, so you can query processingTimeMs > 500 without writing regex (sending logs to CloudWatch). Powertools' Logger gives you JSON formatting, the request ID, and cold-start flags for free, and the basic logging permissions come from the AWSLambdaBasicExecutionRole managed policy.
Metrics. Lambda automatically emits Invocations, Errors, Duration, Throttles, ConcurrentExecutions, DeadLetterErrors, and more to CloudWatch at one-minute resolution (metric types). Two facts to internalize: Duration excludes cold start (init time is reported separately), and throttled requests don't increment Invocations — so a naive error rate quietly misses them. For custom metrics, prefer Embedded Metric Format (EMF): you emit a structured log line and CloudWatch extracts the metric, with no extra API call or added latency (Embedded Metric Format).
Tracing. Turn on X-Ray Active tracing to see where time goes across services; the execution role needs xray:PutTraceSegments and xray:PutTelemetryRecords (the AWSXRayDaemonWriteAccess policy) (tracing with X-Ray). Sampling is automatic and not configurable.
Alarms. AWS's guidance is to set thresholds from your function's own baseline rather than guessing, and to use a multi-period evaluation window to filter transient noise (recommended alarms). The ones I always create:
Errors(Sum) — divide byInvocationsfor a rate.Throttles(Sum > 0) — a separate alarm, since throttles don't show up in the error count.Durationat p99 (not average), to catch tail regressions — especially if you're running near the timeout.DeadLetterErrorsfor async functions, to catch failed DLQ deliveries.IteratorAgefor stream sources, to catch a growing backlog.
aws cloudwatch put-metric-alarm \
--alarm-name orders-errors \
--namespace AWS/Lambda --metric-name Errors \
--dimensions Name=FunctionName,Value=orders \
--statistic Sum --period 60 \
--evaluation-periods 5 --datapoints-to-alarm 5 \
--threshold 10 --comparison-operator GreaterThanThreshold
Deploying with Terraform
Click-ops doesn't survive contact with production. Here's an aws_lambda_function that wires together the pieces above — an execution role, sane memory and timeout, X-Ray, reserved concurrency, and a dead-letter queue (Terraform aws_lambda_function):
resource "aws_iam_role" "lambda" {
name = "orders-lambda"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "lambda.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "basic" {
role = aws_iam_role.lambda.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
resource "aws_sqs_queue" "dlq" {
name = "orders-dlq"
message_retention_seconds = 1209600 # 14 days
}
resource "aws_lambda_function" "orders" {
function_name = "orders"
role = aws_iam_role.lambda.arn
filename = "build/orders.zip"
handler = "app.lambda_handler"
runtime = "python3.12"
memory_size = 512
timeout = 30
environment {
variables = {
LOG_LEVEL = "INFO"
IDEMPOTENCY_TABLE = aws_dynamodb_table.idempotency.name
}
}
tracing_config { mode = "Active" }
reserved_concurrent_executions = 100
dead_letter_config {
target_arn = aws_sqs_queue.dlq.arn
}
}
A few production notes straight from the docs:
- The default 3-second timeout is too low for most real workloads; Python and Java functions often need 10–30 seconds, more with large dependencies (common config options).
- Environment variables are not a secrets store — they're plain key-value config. Put credentials and tokens in AWS Secrets Manager (environment variables).
- The DLQ must exist and the role must have
sqs:SendMessagebefore a failure happens, or the failed event has nowhere to land. reserved_concurrent_executionscounts against your account's 1,000 limit — leave headroom (reserved concurrency).- Get the handler format right: for Python it's
filename.function_name; a wrong handler fails the function immediately.
Cost and performance optimization
Lambda bills on requests plus GB-seconds (memory × duration), and here's the counterintuitive part: memory also sets CPU — you get a full vCPU at 1,769 MB (configure memory). So raising memory can make a CPU-bound function finish faster and cost less overall, even though the per-millisecond rate is higher.
Don't guess at the sweet spot — measure it. The official AWS Lambda Power Tuning tool runs your function across memory settings with real invocations and shows you the cost/speed trade-off (profiling functions). For x86 functions, Compute Optimizer can also recommend a size from historical metrics (it doesn't cover arm64 yet).
Finally, consider arm64 / Graviton2: AWS prices it roughly 20% cheaper per duration with strong price-performance, and Python workloads tend to benefit more than Node (instruction set architecture). In Terraform it's one line:
resource "aws_lambda_function" "orders" {
# ...
architectures = ["arm64"]
}
A/B it behind an alias before you flip everything over.
The short version
Going from Hello World to production is mostly about assuming things will go wrong and deciding, in advance, what happens when they do:
- Make handlers idempotent — duplicates are guaranteed, not hypothetical.
- Give failures a home (a DLQ or on-failure destination) and a sensible timeout.
- Know your concurrency limits, and protect fragile downstreams with reserved concurrency.
- Log JSON, alarm on the right metrics, and trace the slow paths.
- Define all of it in Terraform, and keep secrets out of environment variables.
- Right-size memory (it's also CPU) and try arm64.
Every claim here links back to the official AWS or HashiCorp documentation — bookmark those, because they change, and the version that's current when you read this is the one to trust.