Webhook Idempotency Is Not Optional

Stripe will retry webhooks.

That's not a bug. That's the contract.

If your endpoint times out, returns a 500, or your server drops the connection, Stripe assumes the event might not have been processed. So it sends it again. And again. Until it gets a successful response.

If your system treats each delivery as a brand-new event, you don't have a billing system. You have a race condition.

What actually happens in production

In tutorials, webhook handlers look like this:

export async function POST(req: Request) {
  const event = await stripe.webhooks.constructEvent(...)

  if (event.type === "checkout.session.completed") {
    await db.user.update({
      where: { id: userId },
      data: { plan: "pro" },
    });
  }

  return new Response("ok");
}

Looks harmless. Now imagine:

Stripe delivers the event
Your database write succeeds
Your server crashes before returning 200
Stripe retries
Your handler runs again

Maybe that's fine for a simple field update. But what if you also create an invoice record, increment a seat count, append to an audit log, enqueue a background job, or send a transactional email?

Now you've duplicated side effects. That's when the refunds start.

The real problem is state mutation

Webhooks aren't notifications. They are remote systems asking your app to mutate state.

If that mutation is not idempotent, your system is fragile by definition. And the worst part: you often won't notice until weeks later. Duplicate entitlements. Double-processed upgrades. Strange accounting mismatches. All caused by assuming "it only fires once."

What idempotency actually means

Idempotency doesn't mean "probably fine."

It means: the same event can be processed 1 time or 100 times and produce the same final state.

That requires three things:

Signature verification. You must verify the event came from Stripe. Always.
Event deduplication. Store the event.id before performing mutations. If it already exists, return 200 and exit.
Atomic state transitions. Track processing status so concurrent deliveries can't race past each other.

How SaaSCoreX handles it

SaaSCoreX uses a status machine to guarantee exactly-once processing:

export async function processWebhookEvent(
  event: Stripe.Event
): Promise<void> {
  // Deduplication: check if already handled
  const existing = await db.webhookEvent.findUnique({
    where: { stripeEventId: event.id },
  });
  if (existing?.status === "PROCESSED") return;
  if (existing?.status === "PROCESSING") return;

  // Claim the event: RECEIVED → PROCESSING
  const webhookEvent = await db.webhookEvent.upsert({
    where: { stripeEventId: event.id },
    create: {
      stripeEventId: event.id,
      type: event.type,
      status: "PROCESSING",
      attempts: 1,
    },
    update: {
      status: "PROCESSING",
      attempts: { increment: 1 },
    },
  });

  try {
    await handleEvent(event);

    // PROCESSING → PROCESSED
    await db.webhookEvent.update({
      where: { id: webhookEvent.id },
      data: { status: "PROCESSED", processedAt: new Date() },
    });
  } catch (error) {
    // PROCESSING → FAILED (retryable)
    await db.webhookEvent.update({
      where: { id: webhookEvent.id },
      data: { status: "FAILED", error: error.message },
    });
    throw error; // Return 500 so Stripe retries
  }
}

The state machine has four positions: RECEIVED → PROCESSING → PROCESSED or FAILED.

If the event is already PROCESSED, return immediately. No side effects.
If another instance is PROCESSING the same event, bail out. No race.
If processing fails, the event moves to FAILED with an error message. A background job retries failed events every 15 minutes.
Entitlements are derived from durable subscription state — not webhook timing.

If Stripe retries an event 10 times, the final state is identical to 1.

Why this matters more than you think

Most starter kits focus on getting you to "subscription active." Very few focus on keeping your system correct when networks fail, providers retry, background jobs crash, or deployments interrupt requests.

But that's real production. And billing errors don't feel like bugs. They feel like trust violations.

The principle

If your billing system is not idempotent, you are relying on luck.

SaaSCoreX doesn't rely on luck. It assumes failure. It designs for retries. It treats external systems as unreliable by default — because in production, they are.

The webhook handler shown here is production code. See the full implementation and eight other server-enforced subsystems on the architecture page.