Invent Labsbook a call
// work · engineering
b2b · collabNDA

Real-time collaboration rebuild for an enterprise B2B platform.

Replaced a leaky polling system with CRDT-backed sync. 30× the concurrent users, no data-loss incidents.

concurrent connections30×
P95 sync latency78ms
data-loss incidents0

// Replaced a leaky polling system with CRDT-backed sync. 30× the concurrent users, no data-loss incidents.

Problem

The product had been built with a "good enough" real-time story: clients polled an API every two seconds, last-write-wins resolved conflicts, and an audit log papered over the resulting weirdness. This had been fine for years one and two, when the median document had two collaborators. It was no longer fine. Enterprise customers were running sessions with dozens of concurrent editors. Documents were silently losing updates. Customer success was firefighting it. The retention number on the largest accounts was beginning to bend.

The leadership decision was to fix it before it became a public number. Our brief was the rebuild — not as a research project, but as a five-month replacement that would ship to the same customers, on the same product surface, without any visible disruption to the people who used the product daily.

Approach

We replaced the polling layer with a CRDT-backed real-time channel. Y.js handled the conflict-resolution primitive; Cloudflare Durable Objects handled the per-document coordination point; WebSocket connections handled the transport. The conflict-resolution semantics were no longer "last writer wins" — they were "all writers compose, deterministically." A user's edit could not lose to another user's edit. The audit log became a derived artifact rather than a defense mechanism.

The migration path was the engineering work, not the protocol choice. We ran the new system in shadow mode against the old one for eight weeks: both wrote, the old one served reads, and a comparison job logged every divergence. The divergences fell into three buckets — true bugs in the new system (fixed), true bugs in the old system that the comparison job had exposed for the first time (documented and fixed in the new one), and customer behaviors that the old system had quietly broken for years (resolved in customers' favor in the new model). At the end of the shadow window, we flipped the read path. Customers did not notice.

Decisions & trade-offs

  • Y.js over a hand-rolled CRDT. A custom implementation would have given us a marketing line. It would also have given us another runtime to maintain. Y.js is well-understood, well-tested, and someone else's problem to keep on the latest version of the spec.
  • Durable Objects as the coordination point. The alternative was a Redis-backed coordination layer running on the existing K8s footprint. Durable Objects pinned each document to a single coordinator, which made the consistency story dramatically simpler and removed a category of bug we did not want to debug. The cost was a vendor dependency we hadn't had before; we accepted it.
  • Eight-week shadow window, not four. Enterprise customers do not enjoy being the test surface. The longer window caught two failure modes that would have shipped on a faster timeline.
  • Did not migrate the audit log. We let the old audit log retire on its own retention schedule. The new system's source of truth is the CRDT history; the audit log is now derived from it.

Outcome

The platform now handles roughly thirty times the previous peak concurrent connections on a single document, with P95 sync latency in the high tens of milliseconds. No data-loss incidents have been reported since cutover. The enterprise renewals that had been at risk because of the collaboration story are no longer at risk for that reason — which is not the same as saying the renewals were entirely about this work, but the customer-success team has not had to file the same Jira ticket twice.

// work · next step

Recognize the shape of this one?

If your team is staring at a problem with this silhouette, that's usually a good signal an engagement would be useful. The first conversation is free and short.