DynamoDB Hot Partitions: How to Design Around the Most Common Performance Problem
2026-04-06 · 8 min read
You are running DynamoDB in production and things are going well until one day your application starts throwing this error:
ProvisionedThroughputExceededException: The level of configured provisioned
throughput for the table was exceeded. Consider increasing your provisioning
level with the UpdateTable API.
Your first instinct is to increase the provisioned capacity. So you double it. The errors stop for a day, then come back. You double it again. The errors persist. You are now paying four times your original cost and the problem is not solved.
This is the classic hot partition problem, and throwing more capacity at it does not help because DynamoDB distributes provisioned capacity evenly across partitions. If you have 10,000 WCU provisioned across 10 partitions, each partition gets 1,000 WCU. If 80% of your writes target a single partition, that partition needs 8,000 WCU but only has 1,000. The other 9 partitions sit idle with unused capacity.
How DynamoDB Partitions Work
DynamoDB stores data across multiple partitions. Each item is assigned to a partition based on the hash of its partition key. The key points:
- Each partition supports up to 3,000 RCU and 1,000 WCU (these are hard limits)
- Each partition stores up to 10 GB of data
- DynamoDB automatically splits partitions when these limits are exceeded
- Provisioned capacity is divided evenly across all partitions
When you provision 10,000 WCU and your table has 10 partitions, each partition gets 1,000 WCU. DynamoDB has adaptive capacity that can temporarily boost a hot partition, but it cannot exceed the per-partition maximum of 1,000 WCU, and it is not a substitute for good key design.
Identifying Hot Partitions
Before redesigning your table, confirm that you actually have a hot partition problem.
CloudWatch Contributor Insights
This is the single most useful tool for diagnosing DynamoDB partition issues:
# Enable Contributor Insights on your table
aws dynamodb update-contributor-insights \
--table-name MyTable \
--contributor-insights-action ENABLE
Once enabled, Contributor Insights shows you the most accessed partition keys and sort keys over time. If a single partition key accounts for more than 10% of total traffic, you have a hot partition.
CloudWatch Metrics
Check for throttling events:
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name WriteThrottleEvents \
--dimensions Name=TableName,Value=MyTable \
--start-time 2026-04-05T00:00:00Z \
--end-time 2026-04-06T00:00:00Z \
--period 300 \
--statistics Sum
Also check ConsumedWriteCapacityUnits and compare it to ProvisionedWriteCapacityUnits. If consumed capacity is well below provisioned capacity but you are still seeing throttles, that is the definitive signal of a hot partition — there is plenty of capacity in aggregate, but it is concentrated on a few partitions.
The Five Most Common Hot Partition Patterns
Pattern 1: Date-Based Partition Keys
This is the most common mistake I see. Using a date string like 2026-04-06 as the partition key means every write in a given day hits the same partition:
PK: "2026-04-06" SK: "event#00001"
PK: "2026-04-06" SK: "event#00002"
PK: "2026-04-06" SK: "event#00003"
... (all 500,000 daily events on one partition)
The fix: Prepend a shard identifier to distribute writes across partitions:
PK: "2026-04-06#shard-0" SK: "event#00001"
PK: "2026-04-06#shard-1" SK: "event#00002"
PK: "2026-04-06#shard-2" SK: "event#00003"
Use a deterministic sharding strategy — for example, hash the sort key modulo the number of shards:
shard_id = hash(event_id) % NUM_SHARDS
PK = f"{date}#shard-{shard_id}"
Choose the number of shards based on your write throughput. If you need 5,000 WCU and each partition supports 1,000 WCU, use at least 5 shards. I typically use 10-20 to allow headroom.
The trade-off is that reads for a given date now require querying all shards and merging results. But if your read pattern is primarily by individual event ID (using the sort key), this is a non-issue because you can compute the shard from the event ID.
Pattern 2: Tenant ID with Skewed Distribution
In multi-tenant systems, using the tenant ID as the partition key seems natural. But if one tenant generates 90% of your traffic, their partition becomes the bottleneck:
PK: "tenant-large-corp" SK: "order#..." (90% of all writes)
PK: "tenant-small-biz-1" SK: "order#..." (2% of writes)
PK: "tenant-small-biz-2" SK: "order#..." (1% of writes)
The fix: Apply write sharding only for high-volume tenants. Maintain a tenant metadata table that tracks each tenant's shard count:
Tenant "large-corp": shard_count = 20
Tenant "small-biz-1": shard_count = 1
Tenant "small-biz-2": shard_count = 1
For the large tenant, writes go to large-corp#shard-0 through large-corp#shard-19. For small tenants, writes go directly to their tenant ID with no sharding overhead.
Pattern 3: Status-Based Partition Keys
Using an item status as the partition key creates an extreme hot partition because most items are in the same status at any given time:
PK: "PENDING" SK: "order#123" (95% of items)
PK: "COMPLETED" SK: "order#456" (4% of items)
PK: "FAILED" SK: "order#789" (1% of items)
The fix: Never use low-cardinality attributes as partition keys. Instead, use a unique identifier (like order ID) as the partition key and create a Global Secondary Index (GSI) for status-based queries:
Table:
PK: "order#123" SK: "metadata" status: "PENDING"
PK: "order#456" SK: "metadata" status: "COMPLETED"
GSI (StatusIndex):
PK: "PENDING#shard-3" SK: "2026-04-06T10:30:00Z"
PK: "COMPLETED#shard-1" SK: "2026-04-06T09:15:00Z"
Pattern 4: Sequential Numeric IDs
Auto-incrementing IDs cause hot partitions because DynamoDB hashes the partition key, and sequential numbers tend to hash to the same partition range in bursts:
The fix: Use UUIDs or ULIDs instead. ULIDs are particularly useful because they are lexicographically sortable (encoding a timestamp) while still providing sufficient randomness for even partition distribution.
Pattern 5: GSI Hot Partitions
GSI partition keys follow the same rules as table partition keys. A common mistake is creating a GSI with a low-cardinality attribute as the partition key:
GSI on "country" attribute:
PK: "US" (70% of items)
PK: "DE" (15% of items)
PK: "UK" (10% of items)
This GSI will throttle, and GSI throttling propagates back to the base table — writes to the base table will be rejected if the GSI cannot keep up.
The fix: Apply the same sharding strategies to GSI partition keys, or reconsider whether a GSI is the right approach. Sometimes a separate table with a purpose-built key design is simpler and more performant than a heavily sharded GSI.
Single-Table Design: A Word of Caution
Single-table design — storing multiple entity types in one DynamoDB table — is a powerful pattern popularized by Rick Houlihan. It reduces the number of tables to manage and enables complex queries with a single request using composite sort keys and GSI overloading.
However, single-table design amplifies the hot partition problem. When all your entities share one table, a hot partition in one entity type affects read and write capacity for all entity types. If your Orders entity is hot, your Users and Products entities get throttled too.
My recommendation: use single-table design when your access patterns are well-understood and stable, and when the entities are genuinely related (they are queried together). For entities with vastly different traffic patterns or scaling requirements, separate tables give you independent scaling and isolation.
Designing for Uniform Distribution
The golden rule of DynamoDB partition key design is: choose a partition key with high cardinality that distributes writes uniformly.
Good partition keys:
- User ID — if you have many users with roughly equal activity
- Device ID — for IoT workloads with many devices
- UUID / ULID — for event-driven systems where each event is independent
- Composite key — combining two attributes (e.g.,
tenantId#orderId) to increase cardinality
Bad partition keys:
- Date — all traffic for a day hits one partition
- Status — low cardinality, skewed distribution
- Country or region — low cardinality, skewed distribution
- Boolean values — the worst possible cardinality (2 values)
Capacity Mode Considerations
If you are using Provisioned capacity mode and experiencing hot partitions, switching to On-Demand capacity mode can provide temporary relief. On-Demand mode allocates capacity per partition based on observed traffic patterns and can handle spiky workloads better than Provisioned mode with its even distribution.
However, On-Demand mode does not eliminate hot partitions — it just raises the ceiling before throttling occurs. If a single partition key receives sustained high traffic, you will eventually hit the per-partition limits even in On-Demand mode. And On-Demand pricing is roughly 6.5x more expensive per request than well-utilized Provisioned capacity.
The right approach is to fix the partition key design, then choose the capacity mode that matches your traffic pattern. Use On-Demand for unpredictable workloads and Provisioned with auto-scaling for steady-state workloads.
Monitoring and Prevention
After fixing your key design, set up ongoing monitoring:
# CloudWatch alarm for write throttling
aws cloudwatch put-metric-alarm \
--alarm-name "MyTable-WriteThrottles" \
--namespace AWS/DynamoDB \
--metric-name WriteThrottleEvents \
--dimensions Name=TableName,Value=MyTable \
--statistic Sum \
--period 300 \
--evaluation-periods 2 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:123456789:alerts
Keep Contributor Insights enabled permanently. The cost is minimal ($0.000278 per 100 contributor items stored) and the diagnostic value during an incident is enormous. Review the top partition keys weekly as part of your operational review — traffic patterns shift over time, and a well-distributed key today can become a hot key tomorrow if usage patterns change.
DynamoDB is one of the most powerful databases on AWS, but it punishes poor key design mercilessly. Invest the time upfront to model your access patterns, test with realistic data volumes, and monitor partition distribution. The result is a database that scales to virtually any traffic level without operational overhead.
Need help with your AWS infrastructure?
Book a free 30-minute consultation to discuss your challenges.