BchainPay logoBchainPay
EngineeringEVMEthereumInfrastructureAPI

EVM Nonce Management: Preventing Queue Jams in Crypto Payment Gateways

How a single dropped nonce stalls every subsequent transaction in a payment gateway, and the atomic reservation, watchdog, and RBF patterns that prevent it.

By Cipher · Founding engineer, BchainPay9 min read
Illustration for EVM Nonce Management: Preventing Queue Jams in Crypto Payment Gateways

Nonces are simple in isolation: an EVM address's transaction counter, incremented by one with each confirmed transaction. In a single-worker, low-volume system you call eth_getTransactionCount, get 47, use 47, done. In a payment gateway running five concurrent workers across four chains — sending treasury sweeps, gasless relays, and gas-funding transactions at dozens per minute — nonces are the most common source of production incidents.

The mechanics are unforgiving: if nonce 100 is never confirmed, every transaction with nonce 101, 102, and beyond sits in the mempool, unmineable, until the gap is filled. One dropped transaction can stall an entire queue.

This post covers the atomic reservation model, in-flight tracking, the stuck-tx watchdog, and the RBF recovery logic that BchainPay runs in production.

Why nonces jam in concurrent systems#

The naive implementation breaks under concurrency. Both workers call eth_getTransactionCount(address, 'pending') and both receive 100. Both submit a transaction with nonce 100. One of three things happens:

  1. The second submission replaces the first (if it carries a higher fee).
  2. Both are rejected by the node as duplicates, and only one survives propagation.
  3. One is accepted; the other is silently evicted from the mempool at some point after propagation.

In cases 2 and 3, the "losing" worker believes its transaction is in-flight — but nonce 100 was consumed by the winner. The loser's payload (say, a treasury sweep) will never be confirmed. The record reads status: pending; reality is it's gone.

The second failure mode is subtler: a worker reserves nonce 100 at the application layer, crashes before submitting, and nonces 101, 102, 103 are already queued behind it. The gap is invisible until the watchdog fires.

Atomic nonce reservation#

The fix is to stop using the chain as your nonce oracle for new submissions. Use an atomic application-layer counter seeded from the chain:

// Redis INCR is atomic regardless of concurrent callers
async function reserveNonce(chain: string, address: string): Promise<number> {
  const key  = `nonce:${chain}:${address.toLowerCase()}`;
  const next = await redis.incr(key); // returns value after increment
  return next - 1;                    // nonces are 0-indexed
}

Worker A calls reserveNonce and receives 100. Worker B calls it a millisecond later and receives 101. No collision, no gap, no race.

On startup — and after any crash — sync from the chain before accepting new work:

async function syncNonce(chain: string, address: string): Promise<void> {
  const onChain = await provider.getTransactionCount(address, 'pending');
  const key     = `nonce:${chain}:${address.toLowerCase()}`;
 
  // GT ensures we never move the counter backward.
  // If Redis is already at 103 and the chain shows 98 pending,
  // the application counter wins — those extra nonces are in-flight.
  await redis.set(key, String(onChain), { GT: true } as SetOptions);
  logger.info(`nonce:synced chain=${chain} addr=${address} value=${onChain}`);
}

The GT option (Redis 7+) is critical. If the application counter is already ahead of the chain count — because a worker crashed after reserving but before submitting — GT leaves the counter untouched so the reserved nonces remain claimable for retry.

Tracking in-flight transactions#

You cannot recover what you did not record. Every submitted transaction needs a row in persistent storage before the call returns:

interface PendingTx {
  nonce:            number;
  hash:             string;
  chain:            string;
  wallet:           string;
  submittedAt:      Date;
  maxFeePerGas:     bigint;
  to:               string;
  value:            bigint;
  data:             string;
  paymentIntentId?: string; // for state reconciliation on stuck events
}

Write the record immediately after sendTransaction resolves a hash — not after confirmation. If the process crashes between submission and the DB write, the next syncNonce call will set the counter no lower than the chain's pending count, so the orphaned nonce will surface in gap detection (below).

The stuck-tx watchdog#

Run a background task on a 20–30 second interval per chain. For each pending transaction older than the chain's stuck threshold, check for a receipt and either mark it confirmed or trigger RBF:

const STUCK_SEC: Record<string, number> = {
  ethereum: 120,
  polygon:   30,
  bnb:       20,
};
 
async function runStuckWatchdog(chain: string): Promise<void> {
  const threshold = STUCK_SEC[chain] ?? 120;
  const cutoff    = new Date(Date.now() - threshold * 1_000);
  const stuck     = await db.getPendingTxsOlderThan(chain, cutoff);
 
  for (const tx of stuck) {
    const receipt = await provider.getTransactionReceipt(tx.hash);
    if (receipt) {
      await db.markConfirmed(tx.hash, receipt.blockNumber);
      continue;
    }
    await replaceByFee(chain, tx);
  }
}

The receipt check is cheap. It catches the common case where the transaction was included but the confirmation worker has not yet processed it — saving a broadcast that would immediately fail as "already included".

Replace-by-fee#

EIP-1559 requires the replacement maxFeePerGas to exceed the original by at least 10 % (mempool policy, not consensus). In practice, base fees can spike multiple blocks between your first submission and your replacement, so anchor to the current pending-block base fee with 130 % headroom:

async function replaceByFee(chain: string, stuck: PendingTx): Promise<void> {
  const { baseFee } = await getBaseFee(chain);
 
  // 130% of current baseFee vs 110% of the original — take whichever is higher
  const fromBase    = (baseFee * 13n) / 10n;
  const fromOld     = (stuck.maxFeePerGas * 11n) / 10n;
  const maxFeePerGas         = fromBase > fromOld ? fromBase : fromOld;
  const maxPriorityFeePerGas = CHAIN_PRIORITY_FEE[chain];
 
  const replacement = await wallet.sendTransaction({
    nonce:   stuck.nonce,      // same nonce — this is what replaces the original
    to:      stuck.to,
    value:   stuck.value,
    data:    stuck.data,
    maxFeePerGas,
    maxPriorityFeePerGas,
    chainId: CHAIN_IDS[chain],
  });
 
  await db.replaceHash(stuck.hash, replacement.hash, maxFeePerGas);
  logger.info(
    `tx:rbf chain=${chain} nonce=${stuck.nonce} ` +
    `old=${stuck.hash.slice(0, 12)} new=${replacement.hash.slice(0, 12)}`,
  );
 
  if (stuck.paymentIntentId) {
    await webhooks.send(stuck.paymentIntentId, 'payment_intent.processing_delayed');
  }
}

The paymentIntentId check fires an advisory webhook so merchant fulfillment logic does not time out waiting for a sweep that is still working its way through the replacement cycle. No funds are at risk; it is a signal to hold, not to cancel.

Gap detection: nonce holes after a crash#

A gap at nonce N occurs when N was reserved by the application counter but never submitted — or was submitted and dropped without the hash being recorded. Because the stuck watchdog scans by submittedAt, it will never see a gap: there is no row to scan.

Run gap detection on startup and on a slower hourly schedule:

async function detectAndFillGaps(chain: string, address: string): Promise<void> {
  const confirmedNonce = await provider.getTransactionCount(address, 'latest');
  const inFlight       = new Set(await db.getInFlightNonces(chain, address));
 
  if (inFlight.size === 0) return;
 
  const maxNonce = Math.max(...inFlight);
 
  for (let n = confirmedNonce; n <= maxNonce; n++) {
    if (!inFlight.has(n)) {
      logger.warn(`nonce:gap chain=${chain} nonce=${n}`);
      await fillGap(chain, address, n);
    }
  }
}
 
async function fillGap(chain: string, address: string, nonce: number): Promise<void> {
  const { baseFee } = await getBaseFee(chain);
  await wallet.sendTransaction({
    nonce,
    to:                  address,        // self-transfer, no state change
    value:               0n,
    data:                '0x',
    maxFeePerGas:        baseFee * 2n,
    maxPriorityFeePerGas: CHAIN_PRIORITY_FEE[chain],
    chainId:             CHAIN_IDS[chain],
  });
}

A 0-value self-transfer uses 21,000 gas — roughly $0.02 on Polygon, $0.10 on Ethereum mainnet at 20 gwei. It is orders of magnitude cheaper than a stalled queue and a merchant escalation.

EIP-1559 fee estimation#

Before any submission, read the base fee from the pending block:

async function getBaseFee(chain: string): Promise<{ baseFee: bigint }> {
  const pending = await provider.getBlock('pending');
  // Some providers don't expose a pending block; fall back to latest
  const block   = pending ?? await provider.getBlock('latest');
  const baseFee = block?.baseFeePerGas ?? parseUnits('10', 'gwei');
  return { baseFee };
}
 
const CHAIN_PRIORITY_FEE: Record<string, bigint> = {
  ethereum: parseUnits('2',  'gwei'),
  polygon:  parseUnits('30', 'gwei'),  // Polygon priority fees are higher
  bnb:      parseUnits('1',  'gwei'),
  arbitrum: parseUnits('1',  'gwei'),
};

Setting maxFeePerGas = 2 × baseFee gives headroom for two full EIP-1559 upward adjustment cycles. The base fee can increase at most ~12.5 % per block; doubling takes roughly 6 blocks (~72 s on Ethereum mainnet). A transaction sitting in a rising-fee environment for a minute stays valid rather than going invalid and requiring a nonce gap fill.

For a normal submission — not a replacement — always compute the fee fresh from the pending block. Never reuse a fee estimate from a minute ago; the base fee can have moved significantly.

How BchainPay wires this together#

Three transaction types run through the same nonce manager, keyed by (chain, wallet_address):

  1. Gasless relay — BchainPay submits transferWithAuthorization (EIP-3009) or permit + transfer (EIP-2612) on behalf of the payer. These are time-sensitive; the watchdog threshold is 30 s on L2s.

  2. Treasury auto-sweeps — deposit address balances are consolidated into the hot treasury. Sweeps tolerate more latency; the 120 s threshold is acceptable.

  3. Gas funding — some EVM chains require a small native-token balance at the deposit address to forward ERC-20 tokens. BchainPay pre-funds those addresses from the gas wallet.

All three share the hot-wallet nonce space on each chain. A single nonce manager — rather than per-use-case counters — eliminates cross-purpose collisions. When a sweep and a relay both fire simultaneously, they each get a unique, sequential nonce without coordination code at the call site.

// All three callers look identical at the call site
const nonce = await reserveNonce('polygon', HOT_WALLET_ADDRESS);
const tx    = await wallet.sendTransaction({ nonce, ...txParams });
await db.savePendingTx({ nonce, hash: tx.hash, submittedAt: new Date(), ...meta });

Merchants using BchainPay do not interact with this layer. The payment intent API abstracts it: create the intent, receive the deposit address, get a webhook on confirmation. The nonce complexity stays inside the gateway.

If you are building your own relay or sweep pipeline, these patterns cover the failure modes that burn hours in production.

Key takeaways#

  • Never call eth_getTransactionCount for nonce assignment in concurrent workers. Use an atomic application counter (Redis INCR), seeded from the chain on every startup with SET ... GT so the counter never moves backward.

  • Record every submitted transaction immediately after the hash is returned. The watchdog can only detect stuck transactions that exist in the store; a gap in records becomes a gap in the queue.

  • Stuck thresholds are chain-specific. Polygon at 30 s, Ethereum at 120 s. One flat threshold either fires too early on mainnet or sits idle for minutes on a fast L2.

  • Anchor RBF to the current base fee, not the original. A 110 % bump on a stale base fee may still sit below the network's current floor, guaranteeing another stuck cycle.

  • Gaps are silent. The watchdog catches transactions that exist but are stuck; gap detection catches nonces that were reserved but never submitted. Both probes are required.

  • Fill gaps with a no-op self-transfer. 21,000 gas per gap is a rounding error. Leaving the gap in place is not.

  • Set maxFeePerGas = 2 × baseFee for fresh submissions. This covers six blocks of baseFee increases before the transaction goes invalid. Anything above 3 × is waste; anything below 1.5 × risks getting stuck on a busy block.


Try it yourself

Spin up a sandbox merchant in under 60 seconds.

One REST endpoint, signed webhooks, five chains. No credit card required.

Related reading