Comparing architectural aspects of distributed systems- Part 1
In my quest to explore the internals (the architectural aspects) of three of the famous distributed systems there are in today’s world, I spent a solid 2 days just to compare and contrast the architectural decisions. Why not share these amazing learning with you folks??
Disclaimer — I am not going to get into the depth of the below listed concepts. I have shared relevant links that can help if you are new to these topics.
3 distributed systems that form an integral part of a majority of the cloud native architectures today
A distributed Message broker (Kafka), a distributed database (Cassandra) and a distributed cache (Redis)
Partitioning strategy
Kafka — Individual partitions represent brokers that handle a portion of the entire traffic. Can be visualized as virtual queues within topics
Cassandra — Partitions are Shards of the data. The Shards store portions of the entire lot.
Redis- Similar to the database
Partition Arrangement strategy
Kafka — Consistent Hashing. The client application can choose a key in the message attribute that would be hashed and would help add the message to the ring.
Cassandra — Cassandra is said to use its own hashing algorithm (murmur hashing). The hashed value would determine the placement of the data in a server node.
Cache — Consistent Hashing as a design (not sure if Redis uses this). The key in a KEY-VALUE pair can be subject to the hash algorithm and the value would determine the server node to which this KV would be saved
Data replication strategy (replication factor)
Kafka — The partitions (along with their data) are replicated to other brokers. These are called the partition replicas. Replication factor is equivalent to the total number of partitions/brokers configured.
Cassandra — Data is replicated to the peer nodes. The consistency requirement determines how many copies of the data should be created. Cassandra provides a consistency factor that can be tuned (separately for reads and writes). If reads require a quorum of 3 nodes then the minimum replication of data would be to 3 nodes.
Cache (Redis) — Data is replicated to the nodes identified as replicas. The consistency requirement determines how many copies of the data should be created.
Node coordination/orchestration strategy/protocol
Kafka — Zoo-keeper pattern. A coordinator would be performing the tasks of maintaining the whereabouts of the brokers including their statuses ,ability to keep up with the ingest streams, mapping between the leader and followers and much more.
Cassandra — Gossip protocol ( information exchange between the nodes that are in the ring). This happens every 1 second in Cassandra. The nodes communicate in an effective way to perform the coordination of operations and replication.
Redis Cache — Redis Sentinel does the job of a zoo keeper
Read Strategy
Kafka — Read always happens from the leader partition ( told to have changed in the latest versions of kafka where the consumer can also read from any of the replicas. I am not sure yet if these are done to mimic the read-only replica pattern used in the relational DBs)
Cassandra — Read can happen from any of the designated partitions for read
Note:
If the read consistency level is set to N then the same queried value is read from N nodes before the data is returned to the client.
Redis Cache — Read can happen either from the leader node or from any of the read replicas (that contain the replicated data)
Write Strategy
Kafka —Write happens at the leader partition.
Note (Edit):
In an design that expects quick write operations, the data would be asynchronously replicated to the followers. In a design where the durability of data is important, synchronous replication is used. The leader commits the write to the client only after the acknowledgement from all the In-Sync Replicas (ISR).
Cassandra- Write can be done to any node ( no master -slave concept)
Note:
If the write consistency level (CL) is set to N then the same data would be written to N nodes before a success is returned to the client. This is a key point in a trade-off between consistency and availability in CAP theorem.
Redis cache- Write happens to the leader node/server that the hash resolves to.
Comparison of the systems to be continued in Part-2.
Please note: I have just consolidated a lot of information from various sources (as a part of exploration) and do not intend to claim any of this as my own findings :)
References
Cassandra Architecture
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/architecture/archTOC.html
Distributed Systems in One Lesson
https://www.youtube.com/watch?v=Y6Ev8GIlbxc&list=PLQL1JGGe-t0s0D5Vl6VRfc4HqeRgMiTav&index=7&t=1100s
Kafka: A Modern Distributed System
https://www.youtube.com/watch?v=Ea3aoACnbEk&list=PLQL1JGGe-t0s0D5Vl6VRfc4HqeRgMiTav&index=12&t=2556s
System Design
Distributed Cache
https://www.youtube.com/watch?v=iuqZvajTOyA&t=1045s
Distributed Message Queue