Architecture

AWS ThrottlingException: Understanding and Fixing API Rate Limits

2026-03-29 · 9 min read

Your deployment pipeline just ground to a halt. The CI/CD system is logging errors, your auto-scaling is not working, and your monitoring dashboards are blank. The logs show this:

An error occurred (Throttling) when calling the DescribeInstances operation
(reached max retries: 4): Rate exceeded

Or from a Lambda function:

TooManyRequestsException: Rate exceeded

Or from CloudFormation:

API rate limit exceeded for account 123456789012.

The ThrottlingException means you have exceeded the API rate limit for an AWS service. Unlike DynamoDB throughput limits, these are limits on the AWS control plane API itself — the calls you make to manage AWS resources. Every AWS service has rate limits, and they are often lower than you might expect.

Here is the systematic approach to diagnosing, fixing, and preventing API throttling.

Step 1: Identify Which API Is Being Throttled

The error message usually tells you the operation, but not always the specific limit you hit. Start by checking CloudTrail for throttled events:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=DescribeInstances \
  --start-time "2026-03-29T10:00:00Z" \
  --end-time "2026-03-29T11:00:00Z" \
  --query 'Events[?contains(CloudTrailEvent, `ThrottlingException`)].{
    Time: EventTime,
    Event: EventName,
    Source: EventSource
  }' \
  --output table

For a broader view of all throttled calls across services:

# Using CloudTrail Lake if available
aws cloudtrail start-query \
  --query-statement "SELECT eventTime, eventSource, eventName, errorCode,
    userIdentity.arn, sourceIPAddress
    FROM cloudtrail_events
    WHERE errorCode IN ('Throttling', 'ThrottlingException',
      'TooManyRequestsException', 'RequestLimitExceeded')
    AND eventTime > '2026-03-29 10:00:00'
    ORDER BY eventTime DESC"

Step 2: Check Your Current Service Quotas

AWS Service Quotas lets you view and request increases for rate limits. Check the current limits for the service causing problems:

# List all quotas for EC2
aws service-quotas list-service-quotas \
  --service-code ec2 \
  --query 'Quotas[?contains(QuotaName, `Rate`)].{
    Name: QuotaName,
    Value: Value,
    Adjustable: Adjustable
  }' \
  --output table
# Check a specific quota
aws service-quotas get-service-quota \
  --service-code ec2 \
  --quota-code L-0E3CBDE5 \
  --query 'Quota.{Name: QuotaName, Value: Value, Adjustable: Adjustable}'

Some common default API rate limits that catch teams off guard:

ServiceOperationDefault Limit
EC2DescribeInstances100 calls/sec
EC2RunInstances5 calls/sec
LambdaInvoke500-3000/sec (varies by region)
S3PUT/COPY/POST/DELETE3,500/sec per prefix
S3GET/HEAD5,500/sec per prefix
CloudFormationCreateStack1 call/sec
IAMMost operations15 calls/sec
STSAssumeRole500 calls/sec
CloudWatchPutMetricData500 calls/sec
SSMGetParameter40 calls/sec

Root Cause 1: Polling Loops Without Backoff

The most common cause of API throttling is code that polls an AWS API in a tight loop. Classic examples:

  • A script that calls DescribeInstances every second to check if an instance is running
  • A monitoring system that calls ListMetricData for hundreds of metrics in rapid succession
  • A deployment tool that calls DescribeStackEvents in a loop waiting for CloudFormation to complete

Here is the wrong way:

# WRONG: Tight polling loop
import boto3
import time

ec2 = boto3.client('ec2')

while True:
    response = ec2.describe_instances(InstanceIds=['i-1234567890abcdef0'])
    state = response['Reservations'][0]['Instances'][0]['State']['Name']
    if state == 'running':
        break
    time.sleep(1)  # 1 second is too aggressive for most APIs

And the right way:

# CORRECT: Use waiters (built-in exponential backoff)
import boto3

ec2 = boto3.client('ec2')
waiter = ec2.get_waiter('instance_running')
waiter.wait(
    InstanceIds=['i-1234567890abcdef0'],
    WaiterConfig={
        'Delay': 15,       # Wait 15 seconds between polls
        'MaxAttempts': 40   # Give up after 10 minutes
    }
)

AWS SDKs provide waiters for most common polling scenarios. Always use them instead of writing your own polling loops.

Root Cause 2: Concurrent Infrastructure Automation

Multiple automation tools running simultaneously against the same account can collectively exceed rate limits even if each individual tool stays within bounds. This is especially common when:

  • CI/CD pipelines run in parallel
  • Auto-scaling triggers multiple CloudFormation updates
  • Multiple microservices share a monitoring account and push metrics simultaneously
  • Terraform plans run concurrently with CloudFormation deployments

Check which principals are making the most API calls:

# Find the top API callers in the last hour using CloudTrail
aws cloudtrail lookup-events \
  --start-time "2026-03-29T09:00:00Z" \
  --end-time "2026-03-29T10:00:00Z" \
  --max-results 1000 \
  --query 'Events[].{Source: Username, Event: EventName}' \
  --output json | jq 'group_by(.Source) | map({user: .[0].Source, count: length}) | sort_by(.count) | reverse | .[0:10]'

Root Cause 3: SDK Default Retry Behavior Amplifying Throttling

When an API call is throttled, the SDK retries it. If many clients are retrying simultaneously, the retries themselves can cause more throttling — a classic thundering herd problem.

The default retry configuration for most AWS SDKs uses exponential backoff, but without jitter, multiple clients back off in lockstep and retry at the same time. Configure the SDK properly:

import boto3
from botocore.config import Config

# Use adaptive retry mode with jitter
config = Config(
    retries={
        'max_attempts': 10,
        'mode': 'adaptive'
    },
    max_pool_connections=25
)

client = boto3.client('ec2', config=config)

For the AWS CLI, set retry configuration in your config file:

# ~/.aws/config
[profile my-profile]
retry_mode = adaptive
max_attempts = 10

The three retry modes available:

  • legacy (default): Exponential backoff, limited jitter
  • standard: Exponential backoff with full jitter, respects retry-after headers
  • adaptive: Like standard but also adjusts request rate based on throttling feedback

Always use adaptive mode in production.

Root Cause 4: Shared Rate Limits Across Services

Some rate limits are shared across all operations in a service or even across accounts in an organization. For example:

  • EC2 API rate limits apply to all EC2 API calls combined from an account in a region
  • Organizations API calls from any member account count against the management account's limits
  • Cross-account API calls from IAM roles count against both accounts

This means one team's automation can throttle another team's production workload if they share an account.

# Check if your account is part of an organization with consolidated limits
aws organizations describe-organization \
  --query 'Organization.{Id: Id, MasterAccountId: MasterAccountId}' 2>/dev/null

Root Cause 5: Missing Request Batching

Many AWS APIs support batch operations that accomplish in one call what would otherwise require hundreds. Not using them wastes your rate limit budget:

# WRONG: 100 individual calls (wastes rate limit)
for id in $(cat instance-ids.txt); do
  aws ec2 describe-instances --instance-ids "$id"
done

# CORRECT: 1 batch call (uses rate limit efficiently)
aws ec2 describe-instances \
  --instance-ids $(cat instance-ids.txt | tr '\n' ' ')
# WRONG: Individual SSM parameter lookups
aws ssm get-parameter --name /app/db-host
aws ssm get-parameter --name /app/db-port
aws ssm get-parameter --name /app/db-name

# CORRECT: Batch lookup
aws ssm get-parameters \
  --names /app/db-host /app/db-port /app/db-name \
  --query 'Parameters[*].[Name,Value]' \
  --output table
# WRONG: Individual tag lookups
for instance in i-abc123 i-def456 i-ghi789; do
  aws ec2 describe-tags --filters "Name=resource-id,Values=$instance"
done

# CORRECT: Batch filter
aws ec2 describe-tags \
  --filters "Name=resource-id,Values=i-abc123,i-def456,i-ghi789" \
  --query 'Tags[*].[ResourceId,Key,Value]' \
  --output table

Requesting a Limit Increase

If your usage is legitimate and you simply need higher limits, request an increase through Service Quotas:

# Request a quota increase
aws service-quotas request-service-quota-increase \
  --service-code ec2 \
  --quota-code L-0E3CBDE5 \
  --desired-value 200
# Check the status of your request
aws service-quotas list-requested-service-quota-changes-by-service \
  --service-code ec2 \
  --query 'RequestedQuotas[*].{
    QuotaName: QuotaName,
    DesiredValue: DesiredValue,
    Status: Status,
    Created: Created
  }' \
  --output table

Not all limits are adjustable. Some are hard limits that cannot be increased. Check the Adjustable field in the quota description.

Implementing a Client-Side Rate Limiter

For critical applications, implement a client-side rate limiter to stay below the API limits proactively rather than reacting to throttling:

import time
import threading
from collections import deque

class TokenBucketRateLimiter:
    def __init__(self, rate_per_second, burst_size=None):
        self.rate = rate_per_second
        self.burst_size = burst_size or rate_per_second
        self.tokens = self.burst_size
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self):
        with self.lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            self.tokens = min(
                self.burst_size,
                self.tokens + elapsed * self.rate
            )
            self.last_refill = now

            if self.tokens >= 1:
                self.tokens -= 1
                return True

            wait_time = (1 - self.tokens) / self.rate
            time.sleep(wait_time)
            self.tokens = 0
            self.last_refill = time.monotonic()
            return True

# Usage: limit EC2 API calls to 80/sec (leaving headroom below the 100/sec limit)
ec2_limiter = TokenBucketRateLimiter(rate_per_second=80)

def safe_describe_instances(**kwargs):
    ec2_limiter.acquire()
    return ec2_client.describe_instances(**kwargs)

Prevention Best Practices

  1. Use caching for AWS API responses. If you call DescribeInstances every 10 seconds, cache the result and only refresh when needed. Tools like AWS Config can provide near-real-time resource inventory without polling.

  2. Implement circuit breakers. If an API starts throttling, back off completely for a period rather than retrying immediately.

  3. Separate accounts for separate workloads. API rate limits are per-account. Using separate accounts for production, staging, and CI/CD prevents one from throttling the other.

  4. Use CloudWatch Events/EventBridge instead of polling. Instead of polling for state changes, subscribe to events:

# Create an EventBridge rule for EC2 state changes
aws events put-rule \
  --name "ec2-state-change" \
  --event-pattern '{
    "source": ["aws.ec2"],
    "detail-type": ["EC2 Instance State-change Notification"],
    "detail": {
      "state": ["running", "stopped", "terminated"]
    }
  }'
  1. Monitor API usage with CloudTrail Insights. Enable Insights to automatically detect unusual API call volumes:
aws cloudtrail put-insight-selectors \
  --trail-name my-trail \
  --insight-selectors '[{"InsightType": "ApiCallRateInsight"}, {"InsightType": "ApiErrorRateInsight"}]'
  1. Set up CloudWatch alarms for throttling metrics across critical services so you know when throttling starts, not after it causes an outage.

When Throttling Signals an Architecture Problem

If you are hitting API rate limits regularly, it usually means your architecture is too tightly coupled to the AWS control plane. Production workloads should rarely make control plane API calls — they should use data plane APIs (like S3 GetObject, DynamoDB PutItem) which have much higher limits.

If your application architecture depends on high-frequency control plane calls, it is time to rethink the design. Event-driven patterns, caching layers, and proper resource management eliminate most throttling issues at the architectural level.

We help teams redesign their AWS architectures to avoid rate limiting and build resilient systems that handle AWS API constraints gracefully. If API throttling is disrupting your deployments or production workloads, contact us for a free consultation — we will review your API usage patterns and recommend concrete changes to eliminate the throttling.

Need help with your AWS infrastructure?

Book a free 30-minute consultation to discuss your challenges.