Cold starts were killing us. Not in the "slightly annoying" way that makes you file a low-priority ticket. In the "our enterprise clients are threatening to leave" way that makes you reconsider every architectural decision you've made in the last three years.

Our platform served API requests from 14 AWS Lambda functions behind an API Gateway. On paper, it was textbook serverless. In practice, our p99 latency had crept up to 2.3 seconds. For a developer tools company. The irony was not lost on anyone.

The Problem Nobody Warned Us About

When we first adopted Lambda in 2023, the cold start problem was well-documented. We did everything the blog posts told us: provisioned concurrency, kept functions warm with scheduled pings, minimized bundle sizes. And it worked -- for a while.

What nobody told us was that as your function count grows and your traffic becomes more spiky (as developer tools traffic tends to be), the warm pool becomes increasingly expensive to maintain. We were spending $4,200/month just on provisioned concurrency for functions that were idle 60% of the time.

Note

The numbers in this post are real but rounded. I've anonymized client-specific data and normalized costs to a single-region deployment for clarity.

Measuring What Actually Matters

Before ripping anything out, we spent two weeks instrumenting everything. I mean everything. Here's the script that got us started:

instrumentation/latency-tracker.ts TypeScript
 1import { Histogram, Counter } from 'prom-client';
 2
 3const requestLatency = new Histogram({
 4  name: 'api_request_duration_ms',
 5  help: 'API request duration in milliseconds',
 6  labelNames: ['route', 'method', 'cold_start'],
 7  buckets: [10, 50, 100, 250, 500, 1000, 2500, 5000],
 8});
 9
10const coldStartCount = new Counter({
11  name: 'api_cold_starts_total',
12  help: 'Total number of cold starts',
13  labelNames: ['function_name'],
14});
15
16let isColdStart = true;
17
18export function trackRequest(
19  route: string,
20  method: string,
21  durationMs: number
22) {
23  requestLatency
24    .labels(route, method, String(isColdStart))
25    .observe(durationMs);
26
27  if (isColdStart) {
28    coldStartCount.labels(route).inc();
29    isColdStart = false;
30  }
31}

After two weeks of data collection, the picture was grim. Cold starts accounted for 23% of all requests during peak hours (9-11am EST, when developer teams start their day). The median cold start added 1,800ms. That's not a latency spike -- that's a fundamentally broken user experience.

Latency distribution chart showing bimodal pattern
Fig 1. Bimodal latency distribution -- the valley between warm and cold requests tells the whole story.

The Migration Path

We evaluated three options:

  1. Stay on Lambda, optimize harder -- Provisioned concurrency across all functions, SnapStart for Java-based services. Estimated cost: $8,400/month.
  2. Move to containers (ECS/Fargate) -- Predictable latency, but we'd lose the auto-scaling simplicity. Estimated cost: $3,200/month baseline.
  3. Edge functions (Cloudflare Workers) -- Near-zero cold starts, global deployment, V8 isolate model. Estimated cost: $420/month.

The cost difference alone was compelling, but the latency numbers sealed the deal. Workers spin up in under 5ms. Not 5 seconds. Five milliseconds.

The Worker Architecture

Here's the core of our routing layer after the migration:

src/router.ts TypeScript
 1import { Router } from 'itty-router';
 2import { withAuth } from './middleware/auth';
 3import { withCache } from './middleware/cache';
 4import { handleAnalytics } from './handlers/analytics';
 5import { handleIngest } from './handlers/ingest';
 6
 7const router = Router();
 8
 9// Auth middleware runs at the edge --
10// JWT verification in <1ms using Web Crypto API
11router.all('/api/*', withAuth);
12
13// Cache layer using Cloudflare KV for
14// frequently accessed read paths
15router.get('/api/analytics/*', withCache, handleAnalytics);
16router.post('/api/ingest', handleIngest);
17
18export default {
19  async fetch(request, env, ctx): Promise<Response> {
20    return router.handle(request, env, ctx);
21  },
22};

Simple. Almost suspiciously simple. But that's the point -- the complexity shifted from "managing infrastructure" to "writing good code." Which is exactly where I want it.

The best infrastructure is the kind you forget exists. It should be invisible -- a platform your code runs on, not a problem your team debugates.

Kelsey Hightower, at KubeCon 2024

What Broke Along the Way

It wasn't all smooth sailing. Here are the three biggest problems we hit:

1. The 128MB Memory Ceiling

Workers have a hard memory limit. Our analytics aggregation function was buffering entire result sets in memory before streaming them to the client. We had to rewrite it to use TransformStream for chunked processing. Took two days, but the result was actually better than the original.

2. No Native Database Drivers

Workers run on V8, not Node.js. That means no pg, no mysql2, no native TCP sockets. We moved to Cloudflare's Hyperdrive for Postgres connections, which pools connections at the edge and proxies them to our origin database. Latency impact: negligible.

Warning

If you're using connection-heavy ORMs like Prisma, test thoroughly before migrating. The connection model at the edge is fundamentally different, and you may hit connection limits faster than expected.

3. Debugging Is Different

CloudWatch logs don't exist in this world. We moved to wrangler tail for real-time log streaming during development and Baselime for production observability. The tooling gap is real but narrowing fast.

Performance dashboard showing improvement
Fig 2. Post-migration dashboard. The p99 latency line finally looks like something we can be proud of.

The Numbers, Four Months Later

Here's where we landed:

The 91% cost reduction was the headline number our CFO cared about. But the one that matters to me is the 97% reduction in p99 latency. Our users feel that every single time they interact with the platform.

Switching zones / A personal note

The Human Cost of Technical Debt

I want to be honest about something the metrics don't show. This migration took three months. During that time, I missed my daughter's school play. I worked through two weekends that I'd promised to my family. The Slack messages at 11pm became routine.

The platform is faster now. The numbers are beautiful. But I've been thinking a lot about what we sacrifice when we treat every technical problem as urgent. This was the right call architecturally. I'm less sure it was the right call for the humans involved -- including me.

"We are what we repeatedly do. Excellence, then, is not an act, but a habit."

Will Durant, paraphrasing Aristotle

Should You Do This?

Maybe. Edge functions are not a universal solution. They're excellent for:

They're not great for:

For us, it was the right move. Our workload was almost entirely I/O-bound API requests -- exactly what edge functions are designed for. Your mileage will vary. Measure first, migrate second.

wrangler.toml TOML
1name = "platform-api"
2main = "src/router.ts"
3compatibility_date = "2026-01-15"
4
5# This single file deploys to 300+ locations.
6# Try doing that with CloudFormation.
7
8[vars]
9ENVIRONMENT = "production"

If you're exploring this path, start with one function. Pick your simplest, most stateless endpoint and migrate it. Watch the numbers. If they tell a similar story to ours, you'll know what to do next.

Ship fast. Measure everything. And remember: the best architecture is the one that lets you go home on time.