Route53 SERVFAIL: Diagnosing DNS Delegation and Configuration Errors

Everything is down. Your application, your API, your CDN — all unreachable. The health checks are firing, the on-call engineer is paged, and when you try to resolve your domain, you get this:

$ dig example.com

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 43521
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

SERVFAIL. No answer, no authority, nothing. The DNS resolver tried to find your records and failed. Your domain effectively does not exist on the internet.

DNS failures are uniquely painful because they affect everything at once. Unlike a single service going down, a DNS failure makes your entire infrastructure unreachable — every service, every API endpoint, every static asset. I have seen SERVFAIL responses cause multi-hour outages at companies that had redundant infrastructure across multiple regions, all because a single DNS misconfiguration took down name resolution.

Here is the systematic approach I use to diagnose Route53 SERVFAIL errors.

Step 1: Confirm the Failure and Its Scope

First, determine whether the failure is global or localized to specific resolvers:

# Test against Google's public DNS
dig @8.8.8.8 example.com

# Test against Cloudflare's DNS
dig @1.1.1.1 example.com

# Test against Route53's own resolver
dig @ns-1234.awsdns-12.org example.com

If all public resolvers return SERVFAIL but querying the Route53 nameserver directly returns the correct answer, the problem is in the delegation chain. If even the Route53 nameserver fails, the problem is in your hosted zone configuration.

Also check different record types:

dig example.com A
dig example.com AAAA
dig example.com MX
dig www.example.com CNAME

If only some record types fail, you may have a specific record misconfiguration rather than a delegation issue.

Step 2: Trace the Delegation Chain

DNS resolution is a chain of delegations from the root servers down to your nameservers. A break anywhere in this chain causes SERVFAIL. Trace it:

dig +trace example.com

This shows every step of the delegation. Look for the point where the trace breaks:

.                       518400  IN  NS  a.root-servers.net.
com.                    172800  IN  NS  a.gtld-servers.net.
example.com.            172800  IN  NS  ns-1234.awsdns-12.org.
example.com.            172800  IN  NS  ns-567.awsdns-34.net.
example.com.            172800  IN  NS  ns-890.awsdns-56.co.uk.
example.com.            172800  IN  NS  ns-123.awsdns-78.com.

If the NS records shown in the trace do not match the nameservers of your Route53 hosted zone, that is your problem.

Root Cause 1: NS Record Delegation Mismatch

This is the most common cause of Route53 SERVFAIL. The NS records at your domain registrar must exactly match the NS records in your Route53 hosted zone.

Check what your hosted zone expects:

aws route53 get-hosted-zone \
  --id Z1234567890ABC \
  --query 'DelegationSet.NameServers'

[
    "ns-1234.awsdns-12.org",
    "ns-567.awsdns-34.net",
    "ns-890.awsdns-56.co.uk",
    "ns-123.awsdns-78.com"
]

Now check what the registrar has:

dig example.com NS +short

If these do not match, update the NS records at your registrar to match the Route53 hosted zone. This mismatch commonly happens when:

You deleted and recreated a hosted zone (new hosted zone gets new NS records)
You moved a domain between AWS accounts
The NS records were never updated after initial Route53 setup

Use the Route53 test-dns-answer API to verify what Route53 would return:

aws route53 test-dns-answer \
  --hosted-zone-id Z1234567890ABC \
  --record-name example.com \
  --record-type A

{
    "Nameserver": "ns-1234.awsdns-12.org",
    "RecordName": "example.com",
    "RecordType": "A",
    "RecordData": ["203.0.113.42"],
    "ResponseCode": "NOERROR",
    "Protocol": "UDP"
}

If this returns the correct answer but public resolvers do not, the problem is definitively in the delegation.

Root Cause 2: Multiple Hosted Zones for the Same Domain

AWS allows creating multiple hosted zones for the same domain name. Each gets a different set of NS records. If you are editing records in one hosted zone but the registrar points to another, your changes have no effect.

List all hosted zones for your domain:

aws route53 list-hosted-zones-by-name \
  --dns-name example.com \
  --query 'HostedZones[?Name==`example.com.`].{
    Id: Id,
    Name: Name,
    RecordCount: ResourceRecordSetCount,
    Private: Config.PrivateZone
  }'

If you see multiple public hosted zones, identify which one the registrar NS records point to. The others are orphaned and should be cleaned up to avoid confusion.

Check the records in each zone:

aws route53 list-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --query 'ResourceRecordSets[?Type==`A` || Type==`CNAME`]'

Root Cause 3: DNSSEC Validation Failures

If you enabled DNSSEC signing on your hosted zone, a mismatch between the DS record at the registrar and the DNSKEY in your zone causes SERVFAIL for all DNSSEC-validating resolvers.

Check if DNSSEC is enabled:

aws route53 get-dnssec \
  --hosted-zone-id Z1234567890ABC

Verify the DS record at the registrar matches:

# Check what DNSSEC records the parent zone has
dig example.com DS +short

# Check what your zone is signing with
dig example.com DNSKEY +short @ns-1234.awsdns-12.org

Common DNSSEC failure scenarios:

DS record was added at the registrar before DNSSEC was enabled in Route53
DNSSEC was disabled in Route53 but the DS record remains at the registrar
KSK (Key Signing Key) was rotated but the DS record was not updated

To disable DNSSEC if it is causing issues (emergency fix):

# First, remove the DS record at your registrar
# Then disable signing in Route53
aws route53 disable-hosted-zone-dnssec \
  --hosted-zone-id Z1234567890ABC

Always remove the DS record at the registrar before disabling DNSSEC in Route53. Doing it in the wrong order causes SERVFAIL because resolvers see the DS record and expect valid DNSSEC signatures.

Root Cause 4: Subdomain Delegation with Missing NS Records

If you delegate a subdomain (like api.example.com) to a different hosted zone, the parent zone must have NS records pointing to the child zone's nameservers.

Check the parent zone for delegation:

aws route53 list-resource-record-sets \
  --hosted-zone-id Z_PARENT_ZONE \
  --query 'ResourceRecordSets[?Name==`api.example.com.` && Type==`NS`]'

If there are no NS records for the subdomain in the parent zone, the subdomain will not resolve. Create them:

aws route53 change-resource-record-sets \
  --hosted-zone-id Z_PARENT_ZONE \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "NS",
        "TTL": 300,
        "ResourceRecords": [
          {"Value": "ns-111.awsdns-11.org"},
          {"Value": "ns-222.awsdns-22.net"},
          {"Value": "ns-333.awsdns-33.co.uk"},
          {"Value": "ns-444.awsdns-44.com"}
        ]
      }
    }]
  }'

Root Cause 5: Private Hosted Zone Not Associated with VPC

Private hosted zones only resolve within VPCs that are explicitly associated with them. If your application runs in a VPC that is not associated, DNS queries return SERVFAIL.

Check VPC associations:

aws route53 get-hosted-zone \
  --id Z_PRIVATE_ZONE \
  --query 'VPCs'

[
    {
        "VPCRegion": "us-east-1",
        "VPCId": "vpc-abc123"
    }
]

If your application's VPC is not in this list, associate it:

aws route53 associate-vpc-with-hosted-zone \
  --hosted-zone-id Z_PRIVATE_ZONE \
  --vpc VPCRegion=us-east-1,VPCId=vpc-def456

For cross-account private hosted zone access, you need to create an authorization first:

# In the account that owns the hosted zone
aws route53 create-vpc-association-authorization \
  --hosted-zone-id Z_PRIVATE_ZONE \
  --vpc VPCRegion=us-east-1,VPCId=vpc-def456

# In the account that owns the VPC
aws route53 associate-vpc-with-hosted-zone \
  --hosted-zone-id Z_PRIVATE_ZONE \
  --vpc VPCRegion=us-east-1,VPCId=vpc-def456

Root Cause 6: Health Check Based Routing with All Unhealthy Targets

If you use Route53 health checks with failover or weighted routing, and all targets are unhealthy, Route53 behavior depends on the "Evaluate Target Health" setting.

Check your health checks:

aws route53 list-health-checks \
  --query 'HealthChecks[*].{
    Id: Id,
    Type: HealthCheckConfig.Type,
    FQDN: HealthCheckConfig.FullyQualifiedDomainName,
    IPAddress: HealthCheckConfig.IPAddress
  }'

# Get health check status
aws route53 get-health-check-status \
  --health-check-id abc-123-def \
  --query 'HealthCheckObservations[*].{
    Region: Region,
    StatusReport: StatusReport.Status
  }'

When all health checks are failing and you have failover routing:

If the primary and secondary are both unhealthy, Route53 returns the primary record (fail-open behavior)
If you have weighted routing with all records unhealthy and EvaluateTargetHealth is true, Route53 may return NODATA (not SERVFAIL, but still no answer)

Review your routing policy:

aws route53 list-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --query 'ResourceRecordSets[?Name==`example.com.` && Type==`A`]'

Root Cause 7: Alias Record Pointing to Deleted Resource

Alias records that point to deleted resources (like a terminated ELB or a deleted CloudFront distribution) can cause resolution failures.

List alias records:

aws route53 list-resource-record-sets \
  --hosted-zone-id Z1234567890ABC \
  --query 'ResourceRecordSets[?AliasTarget!=`null`].{
    Name: Name,
    Type: Type,
    AliasTarget: AliasTarget.DNSName,
    HostedZoneId: AliasTarget.HostedZoneId
  }'

For each alias target, verify the resource still exists:

# Check if an ELB exists
dig dualstack.my-elb-1234567890.us-east-1.elb.amazonaws.com

# Check if a CloudFront distribution is active
aws cloudfront get-distribution \
  --id E1234567890ABC \
  --query 'Distribution.Status'

If an alias target has been deleted, either update the record to point to the new resource or remove it entirely.

Root Cause 8: TTL Propagation Delays

After fixing a DNS issue, you may still see SERVFAIL from some resolvers due to negative caching. Resolvers cache SERVFAIL responses (negative TTL is typically 300 seconds but varies by resolver).

Check the SOA record for the negative cache TTL:

dig example.com SOA

The last number in the SOA record is the minimum TTL, which affects negative caching:

example.com.  900  IN  SOA  ns-1234.awsdns-12.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

The 86400 at the end means negative responses can be cached for up to 24 hours by some resolvers. After fixing the underlying issue, you may need to wait for this cache to expire, or you can flush specific resolver caches:

# Google Public DNS cache flush
# Visit: https://developers.google.com/speed/public-dns/cache

# Cloudflare cache purge
# Visit: https://1.1.1.1/purge-cache/

Prevention and Best Practices

Never delete and recreate a hosted zone without updating the registrar NS records. If you must recreate a zone, immediately update the registrar to match the new NS records.

Monitor DNS resolution externally. Route53 health checks can monitor endpoints, but they do not monitor your own DNS resolution. Use an external monitoring service that resolves your domain from multiple locations.

Keep DNSSEC operations scripted and tested. DNSSEC key rotation is error-prone when done manually. Automate it and test the process in a non-production domain first.

Document your DNS architecture. Maintain a diagram showing which hosted zones exist, which are delegated, and which VPCs are associated with private zones:

# Quick audit: list all hosted zones
aws route53 list-hosted-zones \
  --query 'HostedZones[*].{
    Name: Name,
    Id: Id,
    Private: Config.PrivateZone,
    RecordCount: ResourceRecordSetCount
  }'

Set TTLs appropriately. Lower TTLs (60-300 seconds) for records that might need emergency changes. Higher TTLs (3600+) for stable records to reduce query volume.

Use Route53 health checks with failover routing for critical records so that unhealthy endpoints are automatically bypassed.

When to Call for Help

DNS issues that resist the above diagnosis usually involve complex delegation chains across multiple registrars and DNS providers, DNSSEC issues during domain transfers, or split-horizon DNS interactions between private and public hosted zones. Because DNS failures affect everything simultaneously, getting expert help quickly is critical. We help teams design, audit, and troubleshoot Route53 configurations. If your domain is experiencing resolution failures or you want to prevent them with a proper DNS architecture review, reach out for a free consultation. We will trace your entire delegation chain and identify every vulnerability.