// Replaced an ad-hoc balance system with a strict double-entry ledger. Reconciliation drift went to zero and stayed there.
Problem
The platform had grown from a payment-flows prototype into a Series A business processing real customer money, on a balance system that had been built when "real customer money" was hypothetical. Balances lived in a single mutable row per account. Adjustments were applied in-place. There was no append-only history. The reconciliation job ran nightly, found small discrepancies between the platform's reported balances and the bank's, and routinely closed them by trusting the bank's number.
This had worked for two years. It was about to stop working. The next product surface — instant settlement for merchants — required a balance number the platform could defend on demand, not a number it could approximate after a nightly job. And the auditors had started asking the kind of questions that have a wrong answer.
Approach
We rebuilt the system around a strict double-entry ledger. Every value movement became two entries, debit and credit, against immutable accounts. Balances were no longer stored — they were derived, by summing entries. The implementation lived in Postgres because Postgres has been a perfectly good ledger for forty years and the operational story for anything else was untenable for a five-person team.
The hard work was not the ledger primitive. The hard work was the migration. The platform had two years of operational history that needed to be replayed into the new model without changing any customer-visible balance. We built a shadow-write window of six weeks: the new ledger ran alongside the old system, both wrote, neither read, and a diff job compared their outputs row-for-row. At the end of the window, the diff was zero across every account. Then we flipped the read path.
Decisions & trade-offs
- Postgres over a "real" ledger system. Two on-prem ledger products would have been technically appropriate. Both would have added an operational surface the team didn't have headcount to own. Postgres with a small set of strict constraints and triggers does the job at this scale and removes a vendor relationship from the critical path.
- Append-only, period. No updates, no deletes, no "fix a typo" shortcuts. Corrections are themselves entries. This added friction during the build and removed an entire class of failure forever after.
- Six-week shadow-write window before cutover. The temptation was to compress to two weeks. We didn't. The cutover went through on a Tuesday afternoon with no observed delta, which is the right way for a ledger cutover to go through.
- Wrote the runbook before the code. Reconciliation failure modes, double-spend race conditions, partial-failure recovery — all documented before the implementation began. Several of those documents prevented bugs that would otherwise have shipped.
Outcome
Reconciliation drift has been zero in the ninety days following cutover — the previous baseline was small but consistent. Write throughput on the balance path is up roughly eight-fold, mostly because the old in-place-update pattern was lock-contended and the new append-only path isn't. The instant-settlement product surface that prompted the rewrite shipped two months after cutover, on the new ledger, with the auditors satisfied.