Skip to main content

Cassandra – Architecture


<<Back to Cassandra Main Page

Cassandra – Architecture

The design goal of Cassandra is to handle big data workloads across multiple nodes as well as to provide high availability without any single point of failure.
Cassandra stores data on different nodes with a peer-to-peer distributed fashion architecture.
All the nodes exchange information with each other using Gossip protocol


Components of Cassandra


Following are the key components of Cassandra
Node - Node is the place where data is stored. It’s the basic component of Cassandra
Data Center - ( Replication group) In Cassandra, collection of related nodes are called datacenter.
Cluster - Cluster is collection of many datacenters
Commit Log - Every write operation is written to commit log. Commit Log provides commit recovery mechanism. (It can be compared with redo logs in oracle) 
Mem-table - Data is written to Mem-table after Commit Log. Data written to Mem-table are temporary.
SSTable - When Mem-table reaches a certain threshold, data is flushed to an SSTable disk file
Bloom filter − A Bloom filter is a space-efficient probabilistic data structure (or cache), used to identify whether an element is a member of set.

Cassandra - Write Operation

Before jumping to Cassandra Write Operation let us quickly get ourselves familiar with Consistency level
Consistency level - Consistency Level determines how many nodes will respond back with the success acknowledgment.
Cassandra performs the write operation as follows.
  • The coordinator sends a write request to replicas.
  • The node (replicas) first write it in the commit Log.
  • Then Cassandra writes the data in the mem-table.
  • Cassandra flushes memtables to disk, creating SSTables when the commit log space threshold or the memtable cleanup threshold has been exceeded .

Cassandra - Read Operation

To satisfy a read, Cassandra must combine results from the active memtable and potentially multiple SSTables.
Cassandra processes data at several stages on the read path to discover where the data is stored.

Comments