Idempotency and Retries: Why Systems Must Survive “Try Again”
How backend systems stay correct when the same request happens more than once

I'm a passionate backend dev
In the last few articles, we’ve talked about difficult things: concurrency, race conditions, and caching. But we made a dangerous assumption in all of them.
We assumed the network actually works.
Here is the uncomfortable truth about backend engineering: Networks are unreliable. Servers randomly crash. Databases time out.
You send a request to a payment gateway, and the connection drops. You don't get a "Success" message, but you don't get a "Failure" message either. You are stuck in limbo.
What do you do?
The Engineer’s Dilemma
Imagine your backend is processing a $50 payment for a user.
Your Server: "Hey Payment Gateway, charge User A $50."
Network: [Silence... connection times out]
Your Server: ...
Now you have a massive problem.
Did the payment go through? Maybe the gateway got the request, charged the card, but the response got lost on the way back.
Did it fail? Maybe the request never even reached the gateway.
If you assume it failed and try again (Retry), you risk charging the user twice. If you assume it succeeded and do nothing, you might give the user the product for free.
In a distributed system, you cannot simply "hope" things worked. You need a mechanism to handle uncertainty.
Why Retries Are Inevitable
Networks are unreliable. Processes crash. Timeouts lie.
From the outside, a failure looks like silence.
Did the request reach the server?
Did it partially execute?
Did it succeed but fail to respond?
There’s no reliable answer. So systems retry.
Not as an optimization. As a survival mechanism.
The Magic Word: Idempotency
To solve the dilemma, we need a concept borrowed from mathematics. It’s a big word for a simple idea: Idempotency (pronounced eye-dem-po-ten-see).
An idempotent operation is one that you can perform multiple times without changing the result beyond the first initial application.
In simpler terms: It's safe to retry.
The "Set" vs. "Add" Example
To understand it, look at these two ways of updating a bank balance:
Not Idempotent (Unsafe): "Deduct $10 from this account."
Run it once: Balance goes from $100 to $90.
Run it again (Retry): Balance goes from $90 to $80.
Result: Disaster.
Idempotent (Safe): "Set the balance to $90."
Run it once: Balance becomes $90.
Run it again (Retry): Balance remains $90.
Result: Safe.
GET requests are usually idempotent (reading data doesn't change it). POST requests (creating/charging) are usually NOT idempotent by default.
How to Make Actions Idempotent: The Key
We can't always use "Set" operations. Sometimes we have to charge a card. How do we make that safe?
We use an Idempotency Key.
This is a unique ID (usually a UUID) that the client generates before sending the request. It acts like a unique receipt number for that specific intended action.
The flow changes to this:
Client: Generates UUID
123-abc. Sends: "Charge $50 with key123-abc".Server: Receives request. Checks its database: "Have I already processed key
123-abc?"If NO: The server processes the charge, saves the key
123-abcin the DB, and returns "Success."(Network fails, client retries)
Client: Retries: "Charge $50 with key
123-abc".Server: Checks DB. "Yes, I already processed
123-abc."Server: Does not charge again. It simply returns the saved "Success" message from the first attempt.
The client gets the confirmation it needs, and the user is only charged once.
Why Databases Alone Don’t Solve This
You might think:
“I’ll just rely on transactions.” Transactions protect atomicity. They do not protect repetition.
A transaction does not know:
whether the request is a retry
whether the intent is duplicated
whether the client already gave up
Idempotency lives above the database.
The Art of the Retry: Exponential Backoff
Once your endpoint is idempotent, it is safe to retry. But how should you retry?
If your database is overwhelmed and timing out, retrying immediately just adds more fuel to the fire. You become the annoying kid in the backseat asking "Are we there yet?" every second.
The standard approach is Exponential Backoff.
Attempt 1 fails.
Wait 1 second. Retry.
Wait 2 seconds. Retry.
Wait 4 seconds. Retry.
Wait 8 seconds. Retry. Give up.
This gives the struggling downstream system breathing room to recover.
Summary
The Reality: Networks fail, leaving you unsure if an action happened.
The Risk: Retrying blindly leads to duplicate actions (double charges).
The Solution: Make your critical operations Idempotent using unique keys.
The Strategy: Use Exponential Backoff when retrying to avoid overwhelming systems.
Thinking in Backend means shifting from "happy path" programming (assuming success) to defensive programming (assuming failure). Idempotency is the shield that lets your system survive the chaos of the real world.
What Comes Next
We missed one topic, in the next one we will talk about Serialization and Deserialization.
This article is part of the Thinking in Backend series, where we learn backend engineering by understanding how systems behave under pressure, not just how code looks in isolation.




