ALB 502 Bad Gateway: Fixing Target Group and Health Check Misconfigurations

Your monitoring dashboard lights up. Users are reporting intermittent errors. You check the Application Load Balancer metrics and see a spike in HTTPCode_ELB_502_Count. The ALB is returning 502 Bad Gateway responses to your users, which means the load balancer received the request but could not get a valid response from any backend target.

A 502 from an ALB is fundamentally different from a 5xx from your application. The ALB generates the 502 itself — it means the ALB tried to forward the request to a target but the connection failed, timed out, or returned an invalid response. Your application code may be perfectly fine. The problem is in the layer between the ALB and your targets.

Here is how to systematically diagnose and fix ALB 502 errors.

Step 1: Confirm the Error Source

First, distinguish between 502 errors generated by the ALB and 5xx errors returned by your application. The ALB tracks these in separate CloudWatch metrics.

# Check ALB-generated 502 errors vs target-generated 5xx errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name HTTPCode_ELB_502_Count \
  --dimensions Name=LoadBalancer,Value=app/my-alb/abc123 \
  --start-time 2026-04-28T00:00:00Z \
  --end-time 2026-04-29T12:00:00Z \
  --period 300 \
  --statistics Sum \
  --output table

# Compare with target-generated errors
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name HTTPCode_Target_5XX_Count \
  --dimensions Name=LoadBalancer,Value=app/my-alb/abc123 \
  --start-time 2026-04-28T00:00:00Z \
  --end-time 2026-04-29T12:00:00Z \
  --period 300 \
  --statistics Sum \
  --output table

If HTTPCode_ELB_502_Count is high but HTTPCode_Target_5XX_Count is low or zero, the ALB itself is generating the errors. The targets are either unreachable, unhealthy, or returning malformed responses.

Step 2: Check Target Health

The single most common cause of ALB 502 errors is having no healthy targets in the target group. If every target is unhealthy, the ALB has nowhere to send requests and returns 502.

# Check health status of all targets
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --query 'TargetHealthDescriptions[*].{
    Target: Target.Id,
    Port: Target.Port,
    Health: TargetHealth.State,
    Reason: TargetHealth.Reason,
    Description: TargetHealth.Description
  }' \
  --output table

The Reason field tells you why a target is unhealthy:

Elb.RegistrationInProgress — the target was recently registered and health checks have not completed yet
Target.ResponseCodeMismatch — the health check received an HTTP response but not the expected status code
Target.Timeout — the health check timed out waiting for a response
Target.FailedHealthChecks — the target failed the required number of consecutive health checks
Elb.InternalError — the ALB itself had an error checking the target (rare, usually transient)

Root Cause 1: Health Check Path Returns Non-200

The ALB health check sends an HTTP request to a specific path and expects a specific response code. If your application returns 301 (redirect), 404 (not found), or any non-matching code, the health check fails.

# Check current health check configuration
aws elbv2 describe-target-groups \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --query 'TargetGroups[0].{
    HealthCheckPath: HealthCheckPath,
    HealthCheckPort: HealthCheckPort,
    HealthCheckProtocol: HealthCheckProtocol,
    Matcher: Matcher,
    HealthCheckIntervalSeconds: HealthCheckIntervalSeconds,
    HealthCheckTimeoutSeconds: HealthCheckTimeoutSeconds,
    HealthyThresholdCount: HealthyThresholdCount,
    UnhealthyThresholdCount: UnhealthyThresholdCount
  }'

A common mistake: setting the health check path to / when the application redirects / to /login or /dashboard. The health check receives a 302 redirect, which does not match the expected 200, and marks the target as unhealthy.

The fix: Either point the health check to a dedicated health endpoint that always returns 200:

aws elbv2 modify-target-group \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --health-check-path "/health"

Or expand the matcher to accept multiple response codes:

aws elbv2 modify-target-group \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --matcher '{"HttpCode": "200-399"}'

Root Cause 2: Health Check Port Mismatch

By default, the ALB performs health checks on the same port used for traffic routing. But if you override the health check port and it does not match the port your application actually listens on, every health check will fail.

This is especially common in ECS with dynamic port mapping. The container might be assigned port 32768, but the health check is configured for port 8080.

# For ECS services, check the actual registered port
aws elbv2 describe-target-health \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --query 'TargetHealthDescriptions[*].{Id: Target.Id, Port: Target.Port}'

The fix: Set the health check port to traffic-port (the default) to ensure it always matches:

aws elbv2 modify-target-group \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --health-check-port traffic-port

Root Cause 3: Security Group Blocking ALB-to-Target Traffic

The ALB and targets need security groups that allow traffic between them. The ALB's security group needs outbound access to the targets, and the targets' security group needs inbound access from the ALB.

This is one of the most commonly missed configurations. Teams add the ALB to a public security group and their instances to a private security group, but forget to create the inbound rule that allows traffic from the ALB's security group.

# Get the ALB's security groups
aws elbv2 describe-load-balancers \
  --names my-alb \
  --query 'LoadBalancers[0].SecurityGroups'

# Get the targets' security groups (for EC2 instances)
aws ec2 describe-instances \
  --instance-ids i-0abc123 \
  --query 'Reservations[0].Instances[0].SecurityGroups'

# Check if the target security group allows traffic from the ALB security group
aws ec2 describe-security-groups \
  --group-ids sg-target123 \
  --query 'SecurityGroups[0].IpPermissions[*].{
    Protocol: IpProtocol,
    FromPort: FromPort,
    ToPort: ToPort,
    Sources: UserIdGroupPairs[*].GroupId
  }'

The fix: Add an inbound rule to the target security group that allows traffic from the ALB security group on the application port:

aws ec2 authorize-security-group-ingress \
  --group-id sg-target123 \
  --protocol tcp \
  --port 8080 \
  --source-group sg-alb456

Root Cause 4: Target Group Protocol Mismatch

If your target group is configured with protocol HTTP but your application only listens on HTTPS (or vice versa), the ALB cannot establish a connection to the target.

The symptom is that the health check times out — the ALB sends an HTTP request but the target expects HTTPS and never responds to the plaintext request.

# Check target group protocol
aws elbv2 describe-target-groups \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --query 'TargetGroups[0].{Protocol: Protocol, Port: Port}'

The fix: You cannot change the protocol of an existing target group. You need to create a new target group with the correct protocol and update the listener rule to point to it:

# Create a new target group with the correct protocol
aws elbv2 create-target-group \
  --name my-tg-https \
  --protocol HTTPS \
  --port 443 \
  --vpc-id vpc-0abc123 \
  --health-check-protocol HTTPS \
  --health-check-path "/health" \
  --target-type instance

Root Cause 5: Slow-Starting Containers Failing Health Checks

During deployments, new containers need time to start before they can respond to health checks. If the ALB marks them unhealthy before they finish initialization, you get a burst of 502 errors during every deployment.

The deployment pattern looks like this:

New task starts
Target is registered with the ALB
ALB starts health checking immediately
Application is still initializing (loading config, warming caches, establishing database connections)
Health checks fail
ALB marks target as unhealthy
502 errors until old targets are also drained

Check the deregistration delay and health check grace period:

# Check target group attributes
aws elbv2 describe-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --query 'Attributes[*].{Key: Key, Value: Value}' \
  --output table

Look for deregistration_delay.timeout_seconds and slow_start.duration_seconds.

The fix: Enable slow start mode. This gives new targets a warm-up period during which the ALB gradually increases the share of requests sent to them:

aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --attributes Key=slow_start.duration_seconds,Value=120

Also increase the health check interval and unhealthy threshold to give targets more time:

aws elbv2 modify-target-group \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --health-check-interval-seconds 30 \
  --unhealthy-threshold-count 5 \
  --healthy-threshold-count 2 \
  --health-check-timeout-seconds 10

With these settings, a target has 150 seconds (5 checks at 30-second intervals) before being marked unhealthy, plus a 120-second slow start period.

Root Cause 6: Idle Timeout Exceeded

The ALB has an idle timeout (default 60 seconds). If the target takes longer than 60 seconds to respond, the ALB closes the connection and returns a 502.

This typically affects long-running API calls, file uploads, or report generation endpoints.

# Check the ALB idle timeout
aws elbv2 describe-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/my-alb/abc123 \
  --query 'Attributes[?Key==`idle_timeout.timeout_seconds`]'

The fix: Increase the idle timeout:

aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/my-alb/abc123 \
  --attributes Key=idle_timeout.timeout_seconds,Value=300

Important: also increase the keep-alive timeout on your application to be higher than the ALB idle timeout. If the application closes the connection before the ALB does, the ALB will send a request to a closed connection and return a 502. For example, if the ALB idle timeout is 300 seconds, set your application's keep-alive timeout to 310 seconds.

Root Cause 7: Deregistration Delay During Deployments

During rolling deployments, old targets are deregistered from the target group. The ALB continues sending in-flight requests to these targets during the deregistration delay period (default 300 seconds). If the old targets shut down before the deregistration delay completes, those in-flight requests get 502 errors.

The fix: Ensure your application handles SIGTERM gracefully by finishing in-flight requests before shutting down. For ECS, set the stopTimeout in the task definition to be longer than the deregistration delay:

{
  "containerDefinitions": [{
    "name": "my-app",
    "stopTimeout": 120
  }]
}

And set the deregistration delay to match your application's drain time:

aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/my-tg/abc123 \
  --attributes Key=deregistration_delay.timeout_seconds,Value=60

Enabling ALB Access Logs for Deep Diagnosis

If the above checks do not reveal the issue, enable ALB access logs to see the full details of every request, including which target was selected, the response code from the target, and timing information:

aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789:loadbalancer/app/my-alb/abc123 \
  --attributes Key=access_logs.s3.enabled,Value=true \
               Key=access_logs.s3.bucket,Value=my-alb-logs \
               Key=access_logs.s3.prefix,Value=alb

The access logs contain fields like target_status_code, elb_status_code, target_processing_time, and request_processing_time that pinpoint exactly where the 502 is generated.

The Diagnostic Checklist

When you see ALB 502 errors, work through this checklist:

(1 min) Check HTTPCode_ELB_502_Count vs HTTPCode_Target_5XX_Count to confirm the error source
(2 min) Run describe-target-health to check if any targets are healthy
(2 min) Check health check configuration — path, port, protocol, timeout, threshold
(3 min) Verify security groups allow ALB-to-target traffic
(2 min) Check target group protocol matches application protocol
(1 min) Check ALB idle timeout for long-running requests
(2 min) Check deregistration delay and slow start settings for deployment-related 502s

In my experience, roughly 40% of ALB 502 errors are caused by health check misconfigurations (wrong path, wrong port, thresholds too aggressive), 25% are security group issues, 20% are deployment-related (slow start, deregistration delay), and 15% are timeout or protocol mismatches. Start with target health and work outward.

Building Resilient Load Balancing

ALB 502 errors are almost always configuration issues, not infrastructure failures. But getting the configuration right — especially across multiple environments with different ECS services, security groups, and deployment strategies — requires careful architecture.

We help teams design load balancing configurations that handle deployments gracefully, implement proper health check endpoints, and build monitoring that catches 502 spikes before users notice. Contact us for a free AWS consultation and let us review your ALB setup.