Designing a Real-time Collaborative Editor like Google Docs
Learn the system design patterns, concurrency control algorithms, and architectural choices for building a real-time collaborative text editor.
Designing a Real-time Collaborative Editor like Google Docs
Building a real-time collaborative text editor is a classic and notoriously difficult system design problem. It touches on complex challenges involving real-time bi-directional communication, conflict resolution, and distributed state management. In this article, we will unpack the architectural choices required to build a system like Google Docs or Notion.
Core System Requirements
Before designing the architecture, we must define the core requirements:
1. Real-time Collaboration: Multiple users must be able to view and edit the exact same document simultaneously.
High-Level Architecture
The system involves a web client application, a real-time connection layer, a document management API, and a robust persistence layer.
Real-time Communication (WebSockets)
Standard HTTP is stateless and relies on a request-response model, making it highly inefficient for real-time applications where the server needs to push frequent updates to the client.
Instead, we use WebSockets. When a user opens a document, their client establishes a persistent, bi-directional WebSocket connection to the collaboration server. This allows the server to instantly push delta updates (changes) from other users directly to the client without the client needing to poll the server.
For horizontal scalability, WebSocket servers are placed behind a load balancer. Because WebSockets are stateful, we often use a Pub/Sub system (like Redis Pub/Sub) to route messages between different WebSocket servers if User A and User B are connected to different instances.
The Concurrency Control Problem
The most mathematically complex part of a collaborative editor is handling concurrent edits. Imagine a document that currently contains the string CAT.
- User A highlights C and presses delete (intending to leave AT).
- At the exact same moment, User B types
Hat the beginning of the document (intending to make itHCAT). - Document Database (MongoDB / CouchDB): The document structure, complete operation history (event sourcing), and metadata are persisted here.
- Object Storage (Amazon S3): For performance, we periodically take full snapshots of the document state and store them in S3. When a user loads a document, we fetch the latest snapshot from S3 and apply only the recent operations from the database, rather than replaying the entire history from day one.
If these operations happen simultaneously and are simply applied blindly on both clients, User A might end up seeing HAT while User B sees HCAT. The document states have diverged, and the system is broken.
Conflict Resolution Algorithms
To solve this divergence, we need a robust concurrency control mechanism. There are two primary algorithms used in the industry:
1. Operational Transformation (OT)
Operational Transformation (OT) is the algorithm historically used by Google Docs and early collaborative tools. It works by mathematically transforming incoming operations based on the operations that have occurred concurrently.
When User A's Delete at index 0 operation reaches the server, and User B's Insert 'H' at index 0 operation also arrives, the central server transforms them. It realizes that User A wanted to delete the first character, but User B just inserted a new first character. Therefore, the server transforms User A's operation to Delete at index 1.
OT requires a central server to act as the single source of truth to sequence and transform operations. It is mathematically complex to implement perfectly (accounting for every edge case of insertion, deletion, and formatting) but is highly effective.
2. Conflict-free Replicated Data Types (CRDTs)
CRDTs are a newer, increasingly popular mathematical approach used by modern tools. They are specialized data structures designed from the ground up to be replicated across multiple machines and updated independently without central coordination.
The magic of CRDTs is that they guarantee strong eventual consistency. No matter what order concurrent operations are applied, all replicas (clients) will eventually converge to the exact same state.
CRDTs achieve this by assigning unique, fractional identifiers to every single character or element in the document, along with logical clocks to track causal history. Instead of saying "Delete character at index 0", a CRDT says "Delete character with unique ID 0.5".
> Architecture Insight: While OT relies heavily on a central server to sequence operations, CRDTs are peer-to-peer friendly and natively support complex offline editing scenarios, as changes can simply be merged mathematically upon reconnection.
Storage and Persistence
For storing the document state and history, a traditional relational database is often too rigid. A typical storage architecture includes:
- In-Memory Store (Redis): The active document state and recent operations are kept in memory for extremely fast read/write access and conflict resolution.
Conclusion
Designing a collaborative editor is a masterclass in handling distributed state. Whether choosing the battle-tested Operational Transformation for a centralized approach or the modern CRDT for a decentralized, offline-first architecture, mastering these concepts provides profound insights into modern interactive software design.