AWS Cost Governance: Building the Systems That Keep Your Bill Under Control

Two weeks ago I wrapped up a cost optimization engagement for a Series B SaaS company in Frankfurt. Their AWS bill had crept from €18K to €31K over six months — not because of a single incident, but because there was no system to catch the drift. We found the usual culprits — oversized instances, forgotten dev environments, data transfer between regions — but the root cause was always the same: nobody owned the number. There was no tagging, no budget hierarchy, no monthly review. Cost was invisible until it wasn't.

Emergency triage is what you do after your bill spikes. Cost governance is what you build so the spike never happens. This post is about the second thing.

I have run FinOps programmes across accounts ranging from €5K/month startups to €500K/month enterprise platforms. The organizations that consistently spend 20-35% less than their peers share the same five practices. None of them are exotic. All of them require discipline to maintain.

1. Mandatory Tagging: Making Every Euro Attributable

You cannot optimize what you cannot measure, and you cannot measure what you cannot attribute. Tagging is the foundation of every cost governance programme I have ever implemented. Without it, Cost Explorer shows you a number with no story behind it.

The mistake most teams make is treating tagging as optional hygiene. The result is that a year later, 40-60% of their resources are untagged and the cost data is useless for accountability purposes.

The tag schema that works in practice:

After trying various schemes, I have settled on four mandatory tags for every resource in every client account:

Tag Key	Example Values	Purpose
`project`	`checkout-service`, `data-platform`	Cost by product area
`environment`	`prod`, `staging`, `dev`	Separate production from waste
`team`	`platform`, `growth`, `data`	Internal chargeback
`owner`	`alice@example.com`	Who to call when something looks wrong

Enforce tagging with Tag Policies and AWS Config:

Tag Policies at the AWS Organizations level prevent resources from being created without mandatory tags:

# Create a tag policy that enforces the 'project' tag
aws organizations create-policy \
  --name "RequireProjectTag" \
  --type TAG_POLICY \
  --description "Require project tag on all taggable resources" \
  --content '{
    "tags": {
      "project": {
        "tag_key": {
          "@@assign": "project"
        },
        "enforced_for": {
          "@@assign": [
            "ec2:instance",
            "ec2:volume",
            "rds:db",
            "lambda:function",
            "ecs:service",
            "s3:bucket"
          ]
        }
      }
    }
  }'

For existing resources, use the AWS Config managed rule to find untagged resources:

# Check how many resources are missing required tags
aws configservice describe-compliance-by-config-rule \
  --config-rule-names required-tags \
  --query 'ComplianceByConfigRules[*].[ConfigRuleName,Compliance.ComplianceType]' \
  --output table

# Get the actual non-compliant resources
aws configservice get-compliance-details-by-config-rule \
  --config-rule-name required-tags \
  --compliance-types NON_COMPLIANT \
  --query 'EvaluationResults[*].EvaluationResultIdentifier.EvaluationResultQualifier.ResourceId' \
  --output table

Activate Cost Allocation Tags so they appear in Cost Explorer:

# List available cost allocation tags
aws ce list-cost-allocation-tags \
  --status Inactive \
  --query 'CostAllocationTags[*].[TagKey,Status]' \
  --output table

# Activate your key tags for cost allocation
aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status '[
    {"TagKey":"project","Status":"Active"},
    {"TagKey":"environment","Status":"Active"},
    {"TagKey":"team","Status":"Active"}
  ]'

Once your tags are active, you can break down costs in Cost Explorer by any combination of these dimensions. That Frankfurt client I mentioned? Once we back-tagged their resources and activated cost allocation tags, we discovered that their dev and staging environments were costing €8,400/month — nearly a quarter of their total bill — and nobody had realised because there was no environment tag to filter by.

The fix for that alone saved them €5,200/month: shutting down dev instances after 18:00 on weekdays and all weekend, and decommissioning three staging environments that had been running untouched for four months.

2. Three-Tier Budget Alerts: Knowing Before It's Too Late

A single budget alert at 100% of your monthly budget is the equivalent of a smoke alarm that only triggers when the house is already burning. By the time you receive that alert, the month is over.

The budget architecture I implement for every client has three tiers:

Tier 1 — 50% Threshold: Early Warning This alert fires mid-month if you are tracking to overspend. At 50% of monthly budget, you have half a month left to course-correct.

Tier 2 — 80% Threshold: Action Required At 80%, you need to actively investigate. This typically fires in the last week of the month. The response is to identify what is driving the overage and either stop non-critical workloads or approve the additional spend consciously.

Tier 3 — 100% Threshold: Escalation When you hit 100%, it goes to senior leadership — not as a blame exercise, but as a signal that the budget needs revising or a significant architectural decision needs making.

Setting up the three-tier budget in the CLI:

# Create the monthly budget (adjust Amount to your actual budget)
aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Monthly-AWS-Total",
    "BudgetLimit": {"Amount": "30000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostFilters": {}
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 50,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {"SubscriptionType": "EMAIL", "Address": "finops-team@example.com"}
      ]
    },
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {"SubscriptionType": "EMAIL", "Address": "finops-team@example.com"},
        {"SubscriptionType": "EMAIL", "Address": "engineering-manager@example.com"}
      ]
    },
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 100,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {"SubscriptionType": "EMAIL", "Address": "finops-team@example.com"},
        {"SubscriptionType": "EMAIL", "Address": "cto@example.com"}
      ]
    }
  ]'

Add a forecasted spend alert to catch anomalies before month end:

Forecasted alerts are often overlooked but extremely valuable. They fire when AWS projects that your month-end spend will exceed the threshold — based on your current spend rate — even if you haven't crossed it yet.

# Add a forecasted alert at 110% — catches spend acceleration early
aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Monthly-AWS-Forecasted",
    "BudgetLimit": {"Amount": "30000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "FORECASTED",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 110,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {"SubscriptionType": "EMAIL", "Address": "finops-team@example.com"}
      ]
    }
  ]'

Per-team and per-environment budgets:

In addition to the account-level budget, create per-team budgets using Cost Allocation Tag filters. This puts accountability directly on the team that is spending:

# Budget for the data platform team only
aws budgets create-budget \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --budget '{
    "BudgetName": "Team-DataPlatform",
    "BudgetLimit": {"Amount": "8000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostFilters": {
      "TagKeyValue": ["user:team$data-platform"]
    }
  }' \
  --notifications-with-subscribers '[
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {"SubscriptionType": "EMAIL", "Address": "data-team-lead@example.com"}
      ]
    }
  ]'

When team leads receive their own budget alerts, cost becomes their problem to solve — not just the FinOps team's. That shift in ownership is often worth more than any technical optimization.

3. Savings Plan Lifecycle Management: Committing Intelligently

Savings Plans are one of the highest-leverage levers in AWS cost governance, but they require discipline to use well. I have seen clients save 35-45% on their compute bill with well-managed Savings Plans — and I have also seen clients locked into the wrong commitment after buying too aggressively.

The cardinal rule: right-size before you commit. A Savings Plan locks in a spend rate for one or three years. If you commit to covering 20 EC2 instances and then right-size them down to 12, you are now paying for commitments that no longer match your actual usage.

The right sequence is:

Eliminate waste and right-size (4-6 weeks)
Observe stable baseline for 2-4 weeks
Buy Savings Plans to cover 60-70% of baseline (conservative start)
Review and top up quarterly

Check your current Savings Plan coverage and utilization:

# Savings Plan coverage report for last 30 days
aws ce get-savings-plans-coverage \
  --time-period Start=2026-01-15,End=2026-02-15 \
  --granularity MONTHLY \
  --metrics "CoverageHours" \
  --query 'SavingsPlansCoverages[*].Coverage'

# Current utilization (are you using what you bought?)
aws ce get-savings-plans-utilization \
  --time-period Start=2026-01-15,End=2026-02-15 \
  --query 'Total.Utilization'

If your utilization is below 80%, you have over-committed — you are paying for Savings Plan credits you are not using. If your coverage is below 50%, there is significant uncommitted spend that would benefit from a purchase.

Get AWS's purchase recommendation:

# Compute Savings Plan recommendation for 1-year term, no upfront
aws ce get-savings-plans-purchase-recommendation \
  --savings-plans-type COMPUTE_SP \
  --term-in-years ONE_YEAR \
  --payment-option NO_UPFRONT \
  --lookback-period-in-days THIRTY_DAYS \
  --query 'SavingsPlansPurchaseRecommendation.SavingsPlansPurchaseRecommendationDetails[0:3]'

Compute SP vs EC2 Instance SP — when to use which:

Savings Plan Type	Applies to	Flexibility	Discount
Compute SP	EC2, Fargate, Lambda	Any region, instance family, OS	17-36%
EC2 Instance SP	EC2 only	Fixed instance family + region	36-55%

Start with Compute Savings Plans. They give you flexibility to change instance types, switch regions, and move workloads to Fargate or Lambda without losing your commitment. Only add EC2 Instance Savings Plans for workloads you are highly confident will not change instance families in the next year.

Automate the quarterly review:

Create a reminder in your team's calendar for the first Monday of each quarter. The agenda is simple: review coverage, check utilization, check for new purchase recommendations, and decide whether to add coverage. This 30-minute ritual is worth thousands of dollars a year.

4. Compute Optimizer Integration: Continuous Rightsizing

Right-sizing is not a one-time event — it is a continuous process. Workloads change. Traffic patterns shift. New services launch. The team that right-sized their fleet six months ago is already running with suboptimal instances today.

AWS Compute Optimizer analyses your CloudWatch metrics and recommends the most cost-effective instance type, Lambda memory setting, or ECS task definition for each resource. It is free to enable, and in my experience it consistently identifies 15-25% additional savings beyond what teams find manually.

Enable Compute Optimizer for your account:

# Enroll in Compute Optimizer (one-time setup)
aws compute-optimizer update-enrollment-status --status Active

# For an Organizations account, enroll all member accounts
aws compute-optimizer update-enrollment-status \
  --status Active \
  --include-member-accounts

Compute Optimizer needs at least 14 days of CloudWatch data before it can make recommendations. For the most accurate recommendations, wait 30 days after enrolling.

Pull EC2 recommendations in bulk:

# Get all EC2 recommendations sorted by potential savings
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[*].{
    InstanceId:instanceArn,
    CurrentType:currentInstanceType,
    RecommendedType:recommendationOptions[0].instanceType,
    Finding:finding,
    Savings:recommendationOptions[0].estimatedMonthlySavings.value
  }' \
  --output table | sort -k5 -rn

Lambda memory optimization:

Lambda is frequently overlooked in rightsizing exercises because the amounts per-function are small — but at scale they add up significantly. A Lambda function allocated 1024 MB that only uses 180 MB is paying 5.7x too much.

# Get Lambda recommendations
aws compute-optimizer get-lambda-function-recommendations \
  --query 'lambdaFunctionRecommendations[*].{
    Function:functionArn,
    CurrentMemory:currentMemorySize,
    RecommendedMemory:memorySizeRecommendationOptions[0].memorySize,
    Finding:finding
  }' \
  --output table

Build rightsizing into your sprint cycle:

The teams that capture the most value from Compute Optimizer treat it like a product backlog item. Every two weeks, someone pulls the recommendations, prioritizes them by potential savings, and creates tickets for the largest opportunities. A typical mid-sized account with 50-100 EC2 instances will have 5-10 actionable recommendations per cycle, each taking 30-60 minutes to implement and validate.

Over a quarter, this cadence typically delivers an additional 10-20% reduction in compute costs on top of the initial rightsizing exercise.

5. FinOps Culture: Making Cost Everyone's Responsibility

The four practices above are technical. This one is organizational — and in my experience, it is the hardest and most important.

In teams without a FinOps culture, cost is somebody else's problem. Engineers provision what they need without thinking about the bill. Product managers prioritize features without considering operational cost. Finance looks at the total bill at the end of the month and asks vague questions. The gap between who creates cost and who is accountable for it is what allows waste to accumulate.

The core shift: from central ownership to distributed accountability

Instead of one FinOps team that is responsible for everyone's costs, the goal is a model where every team owns its own AWS spend. The FinOps function provides tooling, governance, and coaching — not a cost police force.

Monthly cost review ritual:

The most effective tool I have found is a simple monthly cost review meeting. Each team lead comes prepared with three numbers: last month's spend by team tag, variance from budget, and the top three cost drivers. The meeting runs for 30 minutes. Anomalies are discussed openly.

This sounds obvious. I have never worked with a client that was doing it consistently before we started the engagement.

Publish a cost dashboard:

Make costs visible at all times with a shared Cost Explorer dashboard:

# Create a saved Cost Explorer report for the team cost dashboard
aws ce create-cost-category-definition \
  --name "TeamAllocation" \
  --rules '[
    {
      "Value": "Platform",
      "Rule": {
        "Tags": {
          "Key": "team",
          "Values": ["platform"]
        }
      }
    },
    {
      "Value": "Growth",
      "Rule": {
        "Tags": {
          "Key": "team",
          "Values": ["growth"]
        }
      }
    },
    {
      "Value": "DataPlatform",
      "Rule": {
        "Tags": {
          "Key": "team",
          "Values": ["data-platform"]
        }
      }
    }
  ]' \
  --rule-version "CostCategoryExpression.v1"

Non-production environment controls:

One of the highest-ROI governance policies is automatically shutting down non-production environments after business hours. A dev environment running 24/7 in eu-central-1 on a m5.xlarge costs ~€130/month. The same environment running Monday through Friday 08:00-20:00 costs ~€48/month — a 63% saving with zero productivity impact.

# Tag all non-production instances for scheduled shutdown
aws ec2 create-tags \
  --resources i-0abc123def456 i-0def789abc012 \
  --tags Key=AutoShutdown,Value=enabled Key=environment,Value=dev

# Use Systems Manager Automation to stop tagged instances on a schedule
aws ssm create-association \
  --name "AWS-StopEC2Instance" \
  --targets '[{"Key":"tag:AutoShutdown","Values":["enabled"]}]' \
  --schedule-expression "cron(0 20 ? * MON-FRI *)" \
  --association-name "DevEnvironmentShutdown"

# Restart them in the morning
aws ssm create-association \
  --name "AWS-StartEC2Instance" \
  --targets '[{"Key":"tag:AutoShutdown","Values":["enabled"]}]' \
  --schedule-expression "cron(0 7 ? * MON-FRI *)" \
  --association-name "DevEnvironmentStartup"

Building the Governance Calendar

The five practices above form a system, not a checklist. Here is how they fit together into an ongoing rhythm:

Weekly (15 minutes):

Check budget alerts — are you tracking within expected range?
Review AWS Cost Anomaly Detection notifications

# Set up anomaly detection subscription
aws ce create-anomaly-subscription \
  --anomaly-subscription '{
    "SubscriptionName": "WeeklyCostAnomaly",
    "MonitorArnList": ["arn:aws:ce::123456789012:anomalymonitor/MONITOR_ID"],
    "Subscribers": [
      {"Address": "finops-team@example.com", "Type": "EMAIL"}
    ],
    "Threshold": 100,
    "Frequency": "WEEKLY"
  }'

Monthly (60 minutes):

Pull Cost Explorer report by team and environment
Review Compute Optimizer recommendations and create tickets for top 5
Check Savings Plan coverage and utilization
Identify and decommission unused resources

Quarterly (90 minutes):

Review Savings Plan purchase recommendations
Reassess team budgets based on actual spend trends
Update tag policies if new teams or projects have been created
Review and update non-production shutdown schedules

What This Looks Like in Practice

That Frankfurt client I mentioned at the start? Six months after implementing these five practices, their bill is €19K/month — down from the €31K peak, and growing at a fraction of the rate it was before. More importantly, cost surprises have stopped. Every increase is visible in advance, attributed to a team, and discussed in the monthly review.

The savings breakdown was roughly:

Tagging + attribution → identified €5,200/month in idle dev environments
Budget alerts → caught a data transfer anomaly in week two of the following month, saving €1,800
Savings Plans → 28% discount on €12K of committed compute, saving €3,360/month
Compute Optimizer → 22% rightsizing reduction on EC2 fleet, saving €2,100/month
Automated shutdowns → €1,400/month on non-production environments

That is roughly €13,860/month in recurring savings from five governance practices. None of them required a re-architecture. All of them required consistent execution.

Where to Start

If you are building a cost governance programme from scratch, start with tagging — everything else depends on it. You cannot attribute costs without tags, and you cannot hold teams accountable without attribution. The second priority is the three-tier budget setup, because it gives you early warning while you work on the rest.

Book a free 30-minute consultation and I will walk through your current Cost Explorer setup with you, identify where governance is missing, and give you a prioritised roadmap for implementing these practices in your account.