BchainPay logoBchainPay
EngineeringAPIPaymentsArchitecture

Idempotent payment intents at scale: design and failure modes

How to design a payment intent API that survives client retries, network failures and race conditions — idempotency keys, state machines, dedup storage and the subtle traps.

By Cipher · Founding engineer, BchainPay9 min read
Illustration for Idempotent payment intents at scale: design and failure modes

The POST /v1/payment-intents endpoint is the hottest path in the BchainPay API. It is also the one most likely to be called twice for the same logical payment. Mobile networks drop. Load balancers time out. Client SDKs retry. The question isn't whether duplicate requests arrive — they will — the question is whether your server treats the second call as a no-op or creates a second intent that charges the customer twice.

This post is how we designed idempotency into BchainPay's payment- intent layer: the key contract, the state machine, deduplication storage, the tricky window between "key seen" and "response committed", and the failure modes we hit before we got it right.

What idempotency actually means here#

An HTTP endpoint is idempotent if calling it N times with the same input produces the same observable outcome as calling it once. GET and DELETE are naturally idempotent. POST /v1/payment-intents is not — without explicit handling, each call creates a new intent.

The standard fix is an idempotency key: a client-generated, globally unique string that the server uses to de-duplicate requests. If the server has already completed a request for a given key, it returns the stored response verbatim. If not, it processes normally and stores the result.

This sounds simple. The nuances that bite you in production:

  1. The response might not be stored yet when the second request arrives. There's a window between "we saw the key for the first time" and "we finished processing and stored the response". A naive implementation will let two requests race through that window and create two intents.
  2. The first request might have failed mid-flight. Did we commit to the database before the process crashed? Did the on-chain transaction land before we updated our state? The answer determines whether a retry is safe.
  3. Idempotency keys expire. If a key is valid for 24 hours and a client retries after 25, you have to decide: create a new intent or reject?
  4. On-chain crypto payments are async. Unlike card intents, a crypto payment intent doesn't settle in the same HTTP call. The intent stays open while the blockchain runs. The idempotency key is therefore a key into a long-lived object, not just a dedup cache for one network round-trip.

The key contract#

Every mutating request to the BchainPay API accepts an optional Idempotency-Key header:

POST /v1/payment-intents
Authorization: Bearer sk_live_…
Idempotency-Key: 01HW2QKFP4X5Y3Z8A1B2C3D4E5
Content-Type: application/json
 
{
  "amount":     { "value": 49.99, "currency": "USD" },
  "accept":     ["USDC.ethereum", "USDT.tron"],
  "metadata":   { "order_id": "ord_88712" },
  "expires_in": 1800
}

The key rules:

  • Format: any URL-safe string up to 64 bytes. We recommend a ULID or UUID v4. Do not use order_id directly — it leaks business data and collides across retries with different parameters.
  • Scope: keys are scoped to a (merchant_id, endpoint) pair, not globally. The same key value used for POST /v1/payment-intents and POST /v1/refunds refers to two independent operations.
  • Lifetime: 24 hours from first use. After expiry, the key is eligible for reuse, but the server will process the new request fresh and return a new resource.
  • Parameter binding: if a key is already known and the request body differs from the original, the server returns 422 Unprocessable Entity with "code": "idempotency_key_reused". Retries must send identical parameters.

The state machine#

A payment intent moves through six states:

created
  │
  ▼
awaiting_payment   ← deposit address derived, waiting for on-chain funds
  │
  ├──(partial receive)──► underpaid   (funds arrived, below required amount)
  │
  ▼
pending_confirmation  ← correct amount detected on-chain, waiting for finality
  │
  ▼
confirmed              ← finality SLA met, webhook fired, safe to fulfil
  │
  ▼
settled                ← funds swept to merchant treasury
  │
  ▼
expired / voided       ← terminal states, no funds expected / funds returned

Each transition is a single Postgres row-update guarded by an optimistic-lock column (version). The query looks like:

UPDATE payment_intents
SET    status = $next_status,
       version = version + 1,
       updated_at = now()
WHERE  id = $id
  AND  status = $expected_status
  AND  version = $expected_version
RETURNING *;

If the RETURNING clause comes back empty, a concurrent process won the race and the caller retries with fresh state. This is the compare-and-swap primitive that makes the state machine safe under parallel workers, without serializable isolation on every read.

Deduplication storage#

We use two stores:

Redis: in-flight lock#

When a request with a new idempotency key arrives, we atomically insert a lock before any database write:

// Lua script — atomic on the Redis primary
const lock = await redis.set(
  `idem:${merchantId}:${endpoint}:${idempotencyKey}`,
  'in_flight',
  'EX', 30,   // 30-second max processing window
  'NX'        // only set if not exists
);
if (!lock) {
  // Key already seen — either in-flight or completed
  const stored = await getStoredResponse(merchantId, endpoint, idempotencyKey);
  if (stored?.status === 'in_flight') return { status: 429, body: { code: 'request_in_flight' } };
  if (stored?.status === 'complete') return { status: 200, body: stored.response };
}

If NX fails (key already exists), we check the stored response. If the original is still in-flight, we return 429 with a Retry-After header. The client waits 1-2 seconds and retries. If it's complete, we return the cached response.

Postgres: durable response cache#

After successfully creating the intent, we write the full serialized response to the idempotency_cache table and flip the Redis key from in_flight to complete:

INSERT INTO idempotency_cache (
  merchant_id, endpoint, key, response_status, response_body,
  first_seen_at, expires_at
) VALUES (
  $merchant_id, $endpoint, $key, 201, $response_json,
  now(), now() + interval '24 hours'
)
ON CONFLICT (merchant_id, endpoint, key) DO NOTHING;

ON CONFLICT DO NOTHING means if two requests race past the Redis lock (e.g., after a Redis failover), only one row lands in Postgres. The second request will find the key in Postgres and return the cached response.

The order matters: write to Postgres first, update Redis second. If the process crashes between the two, the next request will miss the Redis key and race on the NX — but then hit the ON CONFLICT DO NOTHING and read the Postgres row. The duplicate intent never gets created.

The dangerous window#

The period between "Redis NX succeeds" and "Postgres insert completes" is where duplicates are born. During this window, a retry will see in_flight in Redis and wait — unless the first request crashed and Redis expired the lock before Postgres committed.

A 30-second Redis TTL means: if processing takes longer than 30 seconds (it shouldn't), a retry could sneak through. The defence is layered:

  1. Processing time limit: payment intent creation has a 5-second timeout. If it exceeds that, we cancel internally and the client's retry is the first real attempt.
  2. Postgres unique constraint: UNIQUE(merchant_id, endpoint, key) is the hard stop that catches anything the Redis lock misses.
  3. Audit log diffing: our background job scans for payment_intents created within 60 seconds of each other with the same order_id in metadata, and flags them for human review. Belt and suspenders.

How async on-chain state interacts with idempotency#

A card payment intent settles in milliseconds. A crypto payment intent stays open until the blockchain delivers. This extends idempotency semantics beyond the initial POST.

When the client calls:

GET /v1/payment-intents/pi_01HW2…

they always get the latest state. There's no idempotency key needed — GET is safe to retry. The idempotency key only applies to the creation (POST) and to state-changing actions like:

POST /v1/payment-intents/pi_01HW2…/expire
Idempotency-Key: 01HW2QKFP4X5Y3Z8A1B2C3D4E5-expire

For on-chain event processing, idempotency is enforced differently: the blockchain indexer is designed to be idempotent at the transaction hash level. If it processes the same Transfer event twice (e.g., after a crash-and-replay), the second pass hits the state machine's compare-and-swap and finds the status is already pending_confirmation — the transition to pending_confirmation requires status = 'awaiting_payment', so it's a no-op.

Testing it in a dev environment#

The easiest way to verify your idempotency implementation:

KEY="test-$(uuidgen)"
 
# First call — should return 201 Created
curl -s -X POST https://api.bchainpay.com/v1/payment-intents \
  -H "Authorization: Bearer $SK" \
  -H "Idempotency-Key: $KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount":{"value":10,"currency":"USD"},"accept":["USDC.polygon"]}' \
  | jq '.id'
 
# Second call — same key, same body; must return 200 with same id
curl -s -X POST https://api.bchainpay.com/v1/payment-intents \
  -H "Authorization: Bearer $SK" \
  -H "Idempotency-Key: $KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount":{"value":10,"currency":"USD"},"accept":["USDC.polygon"]}' \
  | jq '.id'
 
# Third call — same key, different body; must return 422
curl -s -X POST https://api.bchainpay.com/v1/payment-intents \
  -H "Authorization: Bearer $SK" \
  -H "Idempotency-Key: $KEY" \
  -H "Content-Type: application/json" \
  -d '{"amount":{"value":99,"currency":"USD"},"accept":["USDC.polygon"]}' \
  | jq '.error.code'
# → "idempotency_key_reused"

All three assertions should pass before you ship. We run this as a smoke test in CI against a staging environment after every deploy.

Error responses are cached too#

A mistake we made early on: we only cached successful responses. A network failure on the first POST would return a 500, the client would retry, and the retry would process fresh — but the first intent had already been written to the database before the 500. The client never knew about it.

The fix: cache the response regardless of status code, after the database write completes. If the first request results in a 500 after the intent row is written, the retry gets the same 500 back. The client can then call GET /v1/payment-intents?metadata[order_id]=ord_X to discover the intent and decide whether to void it or wait.

The only exception: transient infrastructure errors (timeouts before any write, Redis unavailable) are not cached. If no write happened, there's nothing to protect.

Key takeaways#

  • Issue the idempotency key on the client, not the server. Server- generated keys are useless — the client can't send them on a retry because it never received the response.
  • Redis NX is the concurrency lock; Postgres unique constraint is the durability guarantee. Both are necessary; neither is sufficient alone.
  • The dangerous window is between NX and Postgres commit. Size your Redis TTL to cover worst-case processing time, and never rely on the TTL expiring to "recover" — use Postgres as the final arbiter.
  • Cache error responses too, but only after the write. A 500 that wrote data is a different animal from a 500 that wrote nothing.
  • On-chain async doesn't change idempotency for the creation step. The blockchain's event stream needs its own idempotency contract at the transaction-hash level, which is separate from the API's key contract.
  • Scope keys to (merchant, endpoint). It prevents accidental collisions across endpoints and makes the audit trail obvious.

Try it yourself

Spin up a sandbox merchant in under 60 seconds.

One REST endpoint, signed webhooks, five chains. No credit card required.

Related reading