This story is in continuation to the previous one in this series. Let us continue comparing a few other architectural aspects of the distributed systems.
Scalability (within the cluster)
Kafka - Scaling the Kafka topic can be done by adding more brokers/partitions to the kafka topic. Each broker is a separate Virtual Machine instance that can handle the message broker work. This will help is increasing the throughput and handling spike in traffic.
Note: Zoo-keeper is supposed to keep track of all active partitions after scaling. This helps in distributing the traffic after the scale-out is completed.
Cassandra — Scaling in a Cassandra cluster means adding more shards to the ring (more DB servers/instances) to the cluster. The newly added set of nodes would now store some of the primary shards of data (in addition to any replicas).
Note: Scaling out a Cassandra cluster would create a need to rehash (k/n) keys. This is however kept to a minimum when the consistent hashing method is used to determine the distribution of data.
To achieve High Availability (HA) of the Cassandra cluster within a region the nodes can be located in different availability zones. This would make sure that Zonal failures would not bring the entire cluster down. The picture shows a Cassandra cluster setup Netflix has in a region.
Cache — A distributed cache that has a setup similar to Cassandra would be scaled out in the same manner. The idea is to add a cache node/shard whenever scale out needs arise. Redistribution of the cache data (minimal — k/n) would also be needed.
Scalability (Multi-Cluster Setup)
The concept of scaling does not change from the previous topic. The difference is that instead of adding a new node to same cluster, we would be introducing a new cluster and hence nodes in the second cluster. The reason for this can be one of the following
a) Handling equal amount of traffic in a different region. Going by the best design principles, the data stores/message brokers should be kept in the same region as the applications accessing them. This helps big time in avoiding access latency.
b) High Availability (HA) at a region level. The previous section talks about HA at zone levels. When one of the multiple configured regions goes down, the cluster from the second region ( active-active setup or active after fail-over) can be used to serve requests of all clients
Kafka — We can create multiple clusters and have the client/producer send data to N clusters. This pattern is adopted at Netflix. In case of a message broker as scalable as Kafka (with the growing traffic) we would not consider having any inter-cluster replication of data, even if its asynchronous. If we have an use-case as shown in the right side of the pic, we might need to handle the replication of messages too. Netflix does this on a case by case basis and there is NO global rule.
Cassandra- We can create additional clusters and have the data replicated between these (if necessary). The read and write requests can be served from any of the clusters as Cassandra (as opposed to SQL) is a multi-master database. The following picture shows a 2 cluster setup of Cassandra deployed in 2 regions with replication enabled between them
Cluster Chaining (Consumer Fan-Out Scenario)
Chaining of clusters to handle a consumer fan-out scenario. This concept has been explained very well in Netflix’s presentation. When the number of consumers reading from your topic increases ( which means that each broker/partition has to handle connections from multiple consumers groups )the performance of the topic can be challenged. To mitigate this specific scenario, Netflix tried out the chaining of the Kafka clusters
This use case of chaining is specific to Kafka and we would not have needs to chain clusters for scalability in Cassandra and Redis.
The specifics of CAP theorem is a topic on it own. More about CAP Theorem.
Kafka —The consistency and availability are based on the type of replication that you choose. Kafka supports both Synchronous and Asynchronous replications. The former provides durability at the cost of an increased write latency.
Cassandra — favors availability over consistency.
Cassandra provides tunable consistency. The Write and Read CL determine the level of consistency that the client application needs.
Data replication is synchronous based on the write CL factor.
Redis — favors availability over consistency. Data replication is asynchronous.
Security is a topic that never go unaddressed. You can have the best of practices in designing and architecting a system. If the data or your broker is not protected then everything goes for a toss. This section describes AuthN and AuthZ in the systems’ architecture.
AuthN: Mutual TLS between the producer and broker & broker and consumer.
AuthZ: Can be an OAUTH service principal or Conditional access policy generated and signed (Shared Access Signature in Azure restricting the sender to just write data to topics and consumer to just read data from the topic). These AuthZ measures are specific to Azure and the best practices used in Event Hubs ( a streaming message broker that now has compatibility with Kafka clients)
Cassandra — TLS/SSL encryption of the communication between the client-cluster and between cluster nodes.
AuthN is implemented using Password Credentials and AuthZ using RBAC (Role based access control)
Redis- Network security and optional AuthN
Resiliency- Actions performed when node level failure occurs
What would happen when a node in the cluster goes down? What would happen to the partitions/Shards and the replicas?
Kafka — Partitions (leader and replicas of other leaders) are redistributed to a different broker. This pic illustrates the process of a redistribution of the partitions (leader) and the replicas of other leaders from a failed node across other broker nodes in a kafka cluster
Cassandra — The data is redistributed among the other active nodes in the current cluster . This involves a rehashing of selected set of keys (total keys- k/number of servers- n).
Redis — The data is redistributed among the other active nodes in the current cluster . This involves a rehashing of selected set of keys (total keys- k/number of servers- n). This happens when Gossip protocol is used
In a Leader-follower pattern, any of the other nodes that had the replica(s) of the leader node( that failed) will now be promoted as the new leader ( Zoo-keeper updates this information to serve the client requests appropriately)
Cross Region Replication (between clusters)
Replication across clusters in different regions is a special use case and has to be dealt with caution to avoid unnecessary replication logic and corresponding resource utilization to perform the same.
Kafka- Not performed unless completely necessary. Netflix does cross region replication only for selected topics
Cassandra- performed to be able to handle the same clients’ request from any region. Netflix does cross region replication. If you are interested in this topic , you should probably listen to Josh Evan’s speech about Netflix’s global architecture. What would you expect to watch happen in your Netflix account when you travel for a week from the states to London? If the local cluster(s) in Europe do not have your data, does that mean you are hung out dry?Nah! :)
Disaster Recovery and Fault Tolerance
This is something that the user’s should design. The distributed systems have features of redundancy (on an instance level) and also support replication of the stored data. How we design the DR strategy is left to us.
Kafka- A general DR strategy can be adopted. Active-Passive or Active-Active. When a region fails all the customer workloads are shifted to the fail-over region
a) Manual fail-over in Active-Passive
b) The fail-over region should have the same kafka cluster setup as that of the active region
c) Refer to the case study of Walmart (women who code in YouTube)
Cassandra- We have talked about Quorum and replication strategies to handle the failure of nodes. Backup and Restore is one other measure that needs to be designed when a DR strategy is planned. Refer to this article for some elaborate explanation on Backup and Restore.
Cache- Depends on the design. Multi-region deployment can be done (as done in Netflix for EVCache).
Back-up and restore can be an option when trying to fix the cluster/region that has gone down.
Throughput & Performance
I am going to just point you to links that can provide the bench-marking data. That would be better than any textual content that I can innovatively come up with.
kafka- The throughput and performance increase with the increase in the number of partitions. This essentially calls out for a scale-out architecture. Scaling up of individual nodes to achieve the needed levels of Network performance with an appropriate size of storage is also a key in the design. Read this to understand how a Memory Optimized kafka instance can help in improving the performance. (Jump to the Performance tuning section if you are impatient)
Cassandra- Datastax’s bench-marking
Redis Cache- Azure cache for redis
That is all folks. This story concludes the comparison of 2–3 of the most popular and widely used distributed systems from an architecture perspective.