Lambda Task Timed Out: Fixing VPC, Timeout, and Memory Configuration

Your monitoring just fired this alert:

{
  "errorType": "Task timed out after 30.03 seconds",
  "requestId": "a1b2c3d4-5678-90ab-cdef-EXAMPLE"
}

Or you see this in CloudWatch Logs:

REPORT RequestId: a1b2c3d4-5678-90ab-cdef-EXAMPLE
Duration: 900015.67 ms
Billed Duration: 900000 ms
Memory Size: 128 MB
Max Memory Used: 125 MB
Status: timeout

Lambda timeouts are one of the most misleading errors in AWS because the first instinct is always to look at the code. But in my experience, the majority of Lambda timeout issues have nothing to do with code performance. They are caused by VPC network misconfigurations, insufficient memory leading to CPU throttling, or downstream services that are unreachable.

Here are the five root causes I encounter most often, along with the diagnostic steps and CLI commands to identify and fix each one.

Root Cause 1: Lambda in VPC Without NAT Gateway

This is the single most common cause of Lambda timeouts, and I see it at least once a month during client engagements. When you place a Lambda function in a VPC, it creates an Elastic Network Interface (ENI) in the specified subnets. If those subnets are private subnets without a route to a NAT Gateway, the function has no internet access — and it cannot reach any AWS service endpoint outside the VPC.

The function does not fail immediately. It tries to connect, waits for the TCP timeout, and eventually exceeds the Lambda timeout. The error message says nothing about networking.

Diagnosis

First, check the Lambda function's VPC configuration:

aws lambda get-function-configuration \
  --function-name my-function \
  --query '{
    VpcConfig: VpcConfig,
    Timeout: Timeout,
    MemorySize: MemorySize
  }'

Then check the route table for the subnets the function uses:

# Get the subnet's route table
aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=subnet-0abc123" \
  --query 'RouteTables[0].Routes[*].{
    Destination: DestinationCidrBlock,
    Target: GatewayId || NatGatewayId || TransitGatewayId,
    State: State
  }' \
  --output table

If the only route is the local VPC route (10.0.0.0/16 -> local), the subnet has no internet access. You need a route to a NAT Gateway for the 0.0.0.0/0 destination.

Also verify the NAT Gateway exists and is in an available state:

aws ec2 describe-nat-gateways \
  --filter "Name=vpc-id,Values=vpc-0abc123" \
  --query 'NatGateways[*].{
    ID: NatGatewayId,
    State: State,
    SubnetId: SubnetId,
    PublicIp: NatGatewayAddresses[0].PublicIp
  }' \
  --output table

Solution

Create a NAT Gateway in a public subnet and add a route from the private subnet:

# Allocate an Elastic IP
ALLOC_ID=$(aws ec2 allocate-address --domain vpc --query 'AllocationId' --output text)

# Create NAT Gateway in a public subnet
NAT_ID=$(aws ec2 create-nat-gateway \
  --subnet-id subnet-public123 \
  --allocation-id "$ALLOC_ID" \
  --query 'NatGateway.NatGatewayId' \
  --output text)

echo "Waiting for NAT Gateway to become available..."
aws ec2 wait nat-gateway-available --nat-gateway-ids "$NAT_ID"

# Get the private subnet's route table
RT_ID=$(aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=subnet-private123" \
  --query 'RouteTables[0].RouteTableId' \
  --output text)

# Add the default route through the NAT Gateway
aws ec2 create-route \
  --route-table-id "$RT_ID" \
  --destination-cidr-block 0.0.0.0/0 \
  --nat-gateway-id "$NAT_ID"

If your function only needs to access AWS services (S3, DynamoDB, SQS, etc.) and not the public internet, use VPC endpoints instead of a NAT Gateway — they are cheaper and have lower latency.

Root Cause 2: Insufficient Memory Causing CPU Throttling

Lambda allocates CPU power proportionally to memory. A function with 128 MB of memory gets a fraction of a vCPU, while a function with 1769 MB gets one full vCPU. If your function performs any CPU-intensive work — JSON parsing, data transformation, cryptographic operations — low memory settings will throttle the CPU and cause the function to run much slower than expected.

Diagnosis

Check the memory configuration and actual memory usage:

aws lambda get-function-configuration \
  --function-name my-function \
  --query '{MemorySize: MemorySize, Timeout: Timeout}'

Then query CloudWatch Logs Insights for memory utilization patterns:

filter @type = "REPORT"
| stats avg(@maxMemoryUsed/@memorySize * 100) as avg_mem_pct,
        max(@maxMemoryUsed/@memorySize * 100) as max_mem_pct,
        avg(@duration) as avg_duration_ms,
        max(@duration) as max_duration_ms
by bin(1h)
| sort @timestamp desc
| limit 24

If avg_mem_pct is above 70% or if duration is high relative to what the code should take, memory is likely the bottleneck.

Solution

Use AWS Lambda Power Tuning, an open-source tool, to find the optimal memory setting. As a quick fix, double the memory and observe the impact:

aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 512

Compare durations before and after:

filter @type = "REPORT"
| stats avg(@duration) as avg_ms,
        pct(@duration, 95) as p95_ms,
        avg(@maxMemoryUsed) as avg_mem_mb
by datefloor(@timestamp, 1h)
| sort @timestamp desc
| limit 48

Often, increasing memory from 128 MB to 512 MB cuts execution time by 60-70%, which can actually reduce costs because you are billed for GB-seconds.

Root Cause 3: Security Group Blocking Outbound Traffic

Lambda functions in a VPC use the specified security group for their ENIs. If the security group does not allow outbound traffic on the required ports, the function cannot reach downstream services.

Diagnosis

Check the security group's outbound rules:

aws ec2 describe-security-groups \
  --group-ids sg-0abc123 \
  --query 'SecurityGroups[0].{
    GroupId: GroupId,
    GroupName: GroupName,
    IngressRules: IpPermissions[*].{
      Protocol: IpProtocol,
      FromPort: FromPort,
      ToPort: ToPort,
      Sources: IpRanges[*].CidrIp
    },
    EgressRules: IpPermissionsEgress[*].{
      Protocol: IpProtocol,
      FromPort: FromPort,
      ToPort: ToPort,
      Destinations: IpRanges[*].CidrIp
    }
  }'

If the egress rules do not include port 443 (for HTTPS to AWS services) or the port your downstream service listens on, that is the problem.

Solution

Add the required outbound rules:

# Allow HTTPS outbound (for AWS API calls)
aws ec2 authorize-security-group-egress \
  --group-id sg-0abc123 \
  --ip-permissions IpProtocol=tcp,FromPort=443,ToPort=443,IpRanges='[{CidrIp=0.0.0.0/0}]'

# Allow database port outbound (e.g., PostgreSQL)
aws ec2 authorize-security-group-egress \
  --group-id sg-0abc123 \
  --ip-permissions IpProtocol=tcp,FromPort=5432,ToPort=5432,IpRanges='[{CidrIp=10.0.0.0/16}]'

Root Cause 4: Missing VPC Endpoints for AWS Services

If your Lambda function is in a VPC and calls AWS services like DynamoDB, S3, SQS, or Secrets Manager, it needs a network path to those services. Without a NAT Gateway or VPC endpoint, these calls will time out.

VPC endpoints are preferable to NAT Gateways for AWS service access because they keep traffic within the AWS network, are cheaper, and offer lower latency.

Diagnosis

List existing VPC endpoints in your VPC:

aws ec2 describe-vpc-endpoints \
  --filters "Name=vpc-id,Values=vpc-0abc123" \
  --query 'VpcEndpoints[*].{
    ID: VpcEndpointId,
    Service: ServiceName,
    Type: VpcEndpointType,
    State: State
  }' \
  --output table

If your function calls DynamoDB but there is no com.amazonaws.us-east-1.dynamodb endpoint, you need to create one.

Solution

Create the required VPC endpoints. For gateway endpoints (S3, DynamoDB):

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --service-name com.amazonaws.us-east-1.dynamodb \
  --route-table-ids rtb-private123

For interface endpoints (SQS, Secrets Manager, SSM, etc.):

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-0abc123 \
  --vpc-endpoint-type Interface \
  --service-name com.amazonaws.us-east-1.secretsmanager \
  --subnet-ids subnet-private123 subnet-private456 \
  --security-group-ids sg-0abc123 \
  --private-dns-enabled

Root Cause 5: Timeout Set Too Low for Cold Starts

Lambda cold starts add latency to the first invocation after a period of inactivity. If your function is in a VPC, cold starts are longer because AWS needs to create or attach an ENI. If the function also initializes heavy SDKs, database connection pools, or ML models during the init phase, the cold start can consume a significant portion of the timeout.

Diagnosis

Identify cold start durations with CloudWatch Logs Insights:

filter @type = "REPORT"
| fields @initDuration as cold_start_ms, @duration as exec_ms, @timestamp
| filter ispresent(@initDuration)
| stats avg(cold_start_ms) as avg_cold_ms,
        max(cold_start_ms) as max_cold_ms,
        avg(exec_ms) as avg_exec_ms,
        count(*) as cold_starts
by bin(1d)
| sort @timestamp desc
| limit 14

If max_cold_ms + avg_exec_ms is close to or exceeds your timeout, cold starts are causing timeouts.

Solution

Increase the timeout to account for cold starts, and consider using Provisioned Concurrency for latency-sensitive functions:

# Increase timeout
aws lambda update-function-configuration \
  --function-name my-function \
  --timeout 60

# Enable Provisioned Concurrency (keeps instances warm)
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier prod \
  --provisioned-concurrent-executions 5

Also optimize the initialization code. Move SDK client creation and database connections to the handler module scope (outside the handler function) so they are reused across invocations.

Advanced Debugging: Lambda Insights and X-Ray

When the root cause is not obvious from the basic checks, enable Lambda Insights and X-Ray tracing for detailed performance data.

Enable Lambda Insights

aws lambda update-function-configuration \
  --function-name my-function \
  --layers arn:aws:lambda:us-east-1:580247275435:layer:LambdaInsightsExtension:52

Lambda Insights provides CPU usage, memory utilization, network throughput, and disk I/O metrics per invocation. This is essential for distinguishing between CPU throttling and network issues.

Enable X-Ray Tracing

aws lambda update-function-configuration \
  --function-name my-function \
  --tracing-config Mode=Active

X-Ray shows you exactly where time is spent during each invocation — DNS resolution, TCP connection, TLS handshake, and response wait time for each downstream call. If the trace shows a long TCP connection time to a specific endpoint, the issue is network connectivity, not code.

Diagnostic Decision Tree

Here is the sequence I follow when debugging Lambda timeouts:

# 1. Get function config
aws lambda get-function-configuration \
  --function-name FUNCTION_NAME \
  --query '{VPC:VpcConfig,Timeout:Timeout,Memory:MemorySize}'

# 2. If in VPC, check route tables for NAT Gateway route
aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=SUBNET_ID" \
  --query 'RouteTables[0].Routes'

# 3. Check security group egress rules
aws ec2 describe-security-groups \
  --group-ids SG_ID \
  --query 'SecurityGroups[0].IpPermissionsEgress'

# 4. Check VPC endpoints
aws ec2 describe-vpc-endpoints \
  --filters "Name=vpc-id,Values=VPC_ID" \
  --query 'VpcEndpoints[*].ServiceName'

# 5. Check memory utilization in CloudWatch
# (use Logs Insights query from Root Cause 2)

# 6. Check cold start durations
# (use Logs Insights query from Root Cause 5)

Prevention Strategies

Do Not Put Lambda in a VPC Unless Necessary

This is the most effective prevention strategy. If your function does not need to access resources inside a VPC (like an RDS database or ElastiCache cluster), do not put it in a VPC. Functions outside a VPC have internet access by default and never encounter ENI-related cold starts.

Use VPC Endpoints Instead of NAT Gateways When Possible

If your function only calls AWS services, VPC endpoints are cheaper and more reliable than NAT Gateways. A NAT Gateway costs approximately $32/month plus data processing charges. VPC interface endpoints cost approximately $7.30/month per AZ.

Set Timeout to At Least 2x Expected Duration

If your function normally completes in 5 seconds, set the timeout to at least 10-15 seconds to account for cold starts and temporary latency spikes. But do not set it to the maximum 900 seconds "just in case" — that will cause API Gateway to time out at 29 seconds while your Lambda keeps running and incurring cost.

Monitor with CloudWatch Alarms

Create alarms for duration anomalies so you catch timeout issues before users do:

aws cloudwatch put-metric-alarm \
  --alarm-name "Lambda-my-function-Duration" \
  --metric-name Duration \
  --namespace AWS/Lambda \
  --statistic p95 \
  --period 300 \
  --threshold 10000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --dimensions Name=FunctionName,Value=my-function \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --treat-missing-data notBreaching

Need Help Optimizing Your Lambda Architecture?

Lambda timeout issues are a symptom of deeper architectural decisions — VPC design, memory allocation strategy, and service connectivity patterns. We help teams design serverless architectures that avoid these pitfalls from the start. If your Lambda functions are timing out and you cannot identify the root cause, or if you are planning a serverless migration and want to get the architecture right, we can help.

Get in touch for a free AWS consultation

Lambda Task Timed Out: Fixing VPC, Timeout, and Memory Configuration

Root Cause 1: Lambda in VPC Without NAT Gateway

Diagnosis

Solution

Root Cause 2: Insufficient Memory Causing CPU Throttling

Diagnosis

Solution

Root Cause 3: Security Group Blocking Outbound Traffic

Diagnosis

Solution

Root Cause 4: Missing VPC Endpoints for AWS Services

Diagnosis

Solution

Root Cause 5: Timeout Set Too Low for Cold Starts

Diagnosis

Solution

Advanced Debugging: Lambda Insights and X-Ray

Enable Lambda Insights

Enable X-Ray Tracing

Diagnostic Decision Tree

Prevention Strategies

Do Not Put Lambda in a VPC Unless Necessary

Use VPC Endpoints Instead of NAT Gateways When Possible

Set Timeout to At Least 2x Expected Duration

Monitor with CloudWatch Alarms

Need Help Optimizing Your Lambda Architecture?

Need help with your AWS infrastructure?