← Back to all stories

That AWS Bill That Made My Manager Cry

Forgotten EC2 instances, NAT Gateway nightmares, and the importance of setting up billing alerts BEFORE you need them.

The email subject line was simple: "AWS Bill - Please Explain." The attachment showed a number that made me physically uncomfortable: $47,832.16. Our usual bill was around $3,000.

📧 The Email

My manager forwarded the finance team's email with just one word added: "???"

I remember staring at the Cost Explorer, trying to understand how we went from $3K to nearly $50K in a single month. The graph looked like a hockey stick - flat, flat, flat, then VERTICAL.

🔍 The Investigation

Cost Explorer became my best friend for the next few hours. Here's what I found:

Cost Breakdown
EC2-Instances:      $18,420 (normally $1,200)
NAT Gateway:        $12,340 (normally $200)
Data Transfer:      $8,500 (normally $150)
RDS:               $5,200 (normally $800)
Other:             $3,372 (various)

🎯 The Culprits

1. The Forgotten GPU Instances

Someone on the ML team had spun up 4x p3.8xlarge instances "for testing" three weeks ago. At $12.24/hour each, that's $35K just sitting there. The instances were in a different region, so they didn't show up in our usual dashboards.

2. NAT Gateway Data Transfer

A misconfigured service was downloading the same 2GB file from S3 every 5 minutes. But here's the kicker - it was going through the NAT Gateway instead of using a VPC endpoint. That's $0.045/GB both ways, 24/7.

💸
NAT Gateway Math:
2GB × 12/hour × 24 hours × 30 days × $0.045 × 2 = $15,552
Yes, really.

3. RDS Snapshots Gone Wild

We had manual snapshots enabled alongside automated backups. Someone had created daily manual snapshots "just in case" but never cleaned them up. 90 snapshots × 500GB each = 45TB of snapshot storage.

🛡️ Prevention Strategies

After that incident, we implemented several safeguards:

🚨

Billing Alerts

Set alerts at 50%, 80%, 100%, and 150% of expected spend

🏷️

Mandatory Tagging

Every resource needs Owner, Project, and Environment tags

🔌

VPC Endpoints

S3 and DynamoDB endpoints to avoid NAT costs

🤖

Auto-Shutdown

Lambda function to stop dev instances at night

💡
Pro tip: Use AWS Organizations with Service Control Policies (SCPs) to prevent expensive instance types from being launched in dev accounts.

Our bill is back to normal now ($3,200 last month). But I still check Cost Explorer every morning with my coffee. Once bitten, twice paranoid.

Share this story