Stripe will retry webhooks.
That's not a bug. That's the contract.
If your endpoint times out, returns a 500, or your server drops the connection, Stripe assumes the event might not have been processed. So it sends it again. And again. Until it gets a successful response.
If your system treats each delivery as a brand-new event, you don't have a billing system. You have a race condition.
What actually happens in production
In tutorials, webhook handlers look like this:
export async function POST(req: Request) {
const event = await stripe.webhooks.constructEvent(...)
if (event.type === "checkout.session.completed") {
await db.user.update({
where: { id: userId },
data: { plan: "pro" },
});
}
return new Response("ok");
}
Looks harmless. Now imagine:
- Stripe delivers the event
- Your database write succeeds
- Your server crashes before returning 200
- Stripe retries
- Your handler runs again
Maybe that's fine for a simple field update. But what if you also create an invoice record, increment a seat count, append to an audit log, enqueue a background job, or send a transactional email?
Now you've duplicated side effects. That's when the refunds start.
The real problem is state mutation
Webhooks aren't notifications. They are remote systems asking your app to mutate state.
If that mutation is not idempotent, your system is fragile by definition. And the worst part: you often won't notice until weeks later. Duplicate entitlements. Double-processed upgrades. Strange accounting mismatches. All caused by assuming "it only fires once."
What idempotency actually means
Idempotency doesn't mean "probably fine."
It means: the same event can be processed 1 time or 100 times and produce the same final state.
That requires three things:
- Signature verification. You must verify the event came from Stripe. Always.
- Event deduplication. Store the
event.idbefore performing mutations. If it already exists, return 200 and exit. - Atomic state transitions. Track processing status so concurrent deliveries can't race past each other.
How SaaSCoreX handles it
SaaSCoreX uses a status machine to guarantee exactly-once processing:
export async function processWebhookEvent(
event: Stripe.Event
): Promise<void> {
// Deduplication: check if already handled
const existing = await db.webhookEvent.findUnique({
where: { stripeEventId: event.id },
});
if (existing?.status === "PROCESSED") return;
if (existing?.status === "PROCESSING") return;
// Claim the event: RECEIVED → PROCESSING
const webhookEvent = await db.webhookEvent.upsert({
where: { stripeEventId: event.id },
create: {
stripeEventId: event.id,
type: event.type,
status: "PROCESSING",
attempts: 1,
},
update: {
status: "PROCESSING",
attempts: { increment: 1 },
},
});
try {
await handleEvent(event);
// PROCESSING → PROCESSED
await db.webhookEvent.update({
where: { id: webhookEvent.id },
data: { status: "PROCESSED", processedAt: new Date() },
});
} catch (error) {
// PROCESSING → FAILED (retryable)
await db.webhookEvent.update({
where: { id: webhookEvent.id },
data: { status: "FAILED", error: error.message },
});
throw error; // Return 500 so Stripe retries
}
}
The state machine has four positions: RECEIVED → PROCESSING → PROCESSED or FAILED.
- If the event is already
PROCESSED, return immediately. No side effects. - If another instance is
PROCESSINGthe same event, bail out. No race. - If processing fails, the event moves to
FAILEDwith an error message. A background job retries failed events every 15 minutes. - Entitlements are derived from durable subscription state — not webhook timing.
If Stripe retries an event 10 times, the final state is identical to 1.
Why this matters more than you think
Most starter kits focus on getting you to "subscription active." Very few focus on keeping your system correct when networks fail, providers retry, background jobs crash, or deployments interrupt requests.
But that's real production. And billing errors don't feel like bugs. They feel like trust violations.
The principle
If your billing system is not idempotent, you are relying on luck.
SaaSCoreX doesn't rely on luck. It assumes failure. It designs for retries. It treats external systems as unreliable by default — because in production, they are.
The webhook handler shown here is production code. See the full implementation and eight other server-enforced subsystems on the architecture page.