Skip to main content

Understanding Indepth of Cassandra Gossip Protocol

<<Back to Cassandra Main Page

What is Gossip Protocol.

A peer-to-peer communication protocol to discover and share location and state information about the other nodes in a Cassandra cluster.

Understanding Indepth of Cassandra Gossip

The gossiper is responsible for making sure every node in the system eventually knows important information about every other node's state, including those that are unreachable or not yet in the cluster when any given state change occurs. Gossip timer task runs every second. This means that a node initiates gossip exchange with one to three nodes every second (round) or ( zero if it is alone in the cluster). Information to gossip is wrapped in an ApplicationState object in a key/value pair. The information exchanged in a GossipMessage looks something like below.

HeartBeatState
Each node in Cassandra has one HeartBeatState associated with it which Consists of generation and version number.
Generation grows at every time the node is started.
Version number is shared with application states and guarantees ordering.
ApplicationState
Consists of state and version number (Its the same version number present in HeartBeatState) and represents a state of single "component" or "element" within Cassandra.
For instance application state for "load information" could be (5.2, 45), which means that node load is 5.2 at version 45.
EndPointState
Includes all ApplicationStates and HeartBeatState for certain endpoint (node). EndPointState can include only one of each type of ApplicationState, so if EndPointState already includes, say, load information, new load information will overwrite the old one. ApplicationState version number guarantees that old value will not overwrite the new one.
EndPointStateMap
Internal structure in Gossiper that has EndPointState for all nodes (including itself) that it has heard about.

The Gossip messaging takes place in a three-way handshake format.


GossipDigestSynMessage:
HostA sends out GossipeDigestSynMessage every second. It contains the largest version of local states.
HostB receives this message, it sort and compare the versions with local ones. Then it knows which local states are out of date, which ones are newer than remote status (HostA). Then it build following message, tells HostA these information:
GossipDigestAckMessage:
Basically digest_list contains status that HostB need from HostA. state_map contains status information that NodeA need. In this example, HostB doesn't have the latest info from HostA, HostA doesn't have the latest info from HostB, HostC status are up to date for both hosts.
When HostA gets the message, it apply the state_map to local states. And then build following message with the list from digest_list.
GossipDigestAck2Message
HostB receives the message, and simply update local state with the information from state_map.

so far so good

well So how does a new node get the idea of whom to start gossiping with? Well, Cassandra has many seed provider implementations that provide a list of seed addresses to the new node and starts gossiping with one of them right away. After its first round of gossip, it will now possess cluster membership information about all the other nodes in the cluster and can then gossip with the rest of them.

Cassandra Failure detection


The gossip process tracks state from other nodes both directly (nodes gossiping directly to it) and indirectly (nodes communicated about secondhand, third-hand, and so on). Rather than have a fixed threshold for marking failing nodes, Cassandra uses an accrual detection mechanism to calculate a per-node threshold that takes into account network performance, workload, and historical conditions. During gossip exchanges, every node maintains a sliding window of inter-arrival times of gossip messages from other nodes in the cluster. Configuring the phi_convict_threshold property adjusts the sensitivity of the failure detector. Lower values increase the likelihood that an unresponsive node will be marked as down.

As shown in Figure above HostA did not respond with in time interval of  mean() * ln(10) * phi_convict_threshold the host will be marked as dead.
Default phi_convict_threshold is 8, ln(10) x 8 = 2.3 x 8 = 18.4. Which means even in best case, the failure detection delay is 18seconds, typically it's about 20 seconds 18.4 x 1.1 = 20.3.



Comments

Popular posts from this blog

ORA-28374: typed master key not found in wallet

<<Back to Oracle DB Security Main Page ORA-46665: master keys not activated for all PDBs during REKEY SQL> ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL ; ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL * ERROR at line 1: ORA-46665: master keys not activated for all PDBs during REKEY I found following in the trace file REKEY: Create Key in PDB 3 resulted in error 46658 *** 2019-02-06T15:27:04.667485+01:00 (CDB$ROOT(1)) REKEY: Activation of Key AdnU5OzNP08Qv1mIyXhP/64AAAAAAAAAAAAAAAAAAAAAAAAAAAAA in PDB 3 resulted in error 28374 REKEY: Keystore needs to be restored from the REKEY backup.Aborting REKEY! Cause: All this hassle started because I accidently deleted the wallet and all wallet backup files too and also forgot the keystore password. There was no way to restore the wallet back. Fortunately in my case the PDB which had encrypted data was supposed to be deco...

How to Find VIP of an Oracle RAC Cluster

<<Back to Oracle RAC Main Page How to Find Out VIP of an Oracle RAC Cluster Login clusterware owner (oracle) and execute the below command to find out the VIP hostname used in Oracle RAC $ olsnodes -i node1     node1-vip node2     node2-vip OR $ srvctl config nodeapps -viponly Network 1 exists Subnet IPv4: 10.0.0.0/255.255.0.0/bondeth0, static Subnet IPv6: Ping Targets: Network is enabled Network is individually enabled on nodes: Network is individually disabled on nodes: VIP exists: network number 1, hosting node node1 VIP Name: node1-vip VIP IPv4 Address: 10.0.0.1 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes: VIP exists: network number 1, hosting node node2 VIP Name: node2-vip VIP IPv4 Address: 10.0.0.2 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes:

ORA-16905: The member was not enabled yet

<<Back to Oracle DataGuard Main Page ORA-16905 Physical Standby Database is disabled DGMGRL> show configuration; Configuration - DG_ORCL1P   Protection Mode: MaxPerformance   Members:   ORCL1PP - Primary database     ORCL1PS - Physical standby database (disabled)       ORA-16905: The member was not enabled yet. Fast-Start Failover:  Disabled Configuration Status: SUCCESS   (status updated 58 seconds ago) DGMGRL> DGMGRL> enable database 'ORCL1PS'; Enabled. DGMGRL>  show configuration; Configuration - DG_ORCL1P   Protection Mode: MaxPerformance   Members:   ORCL1PP - Primary database     ORCL1PS - Physical standby database Fast-Start Failover:  Disabled Configuration Status: SUCCESS   (status updated 38 seconds ago)

ORA-46630: keystore cannot be created at the specified location

<<Back to DB Administration Main Page ORA-46630: keystore cannot be created at the specified location CDB011> ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '+DATAC4/CDB01/wallet/' IDENTIFIED BY "xxxxxxx"; ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '+DATAC4/CDB01/wallet/' IDENTIFIED BY "EncTest123" * ERROR at line 1: ORA-46630: keystore cannot be created at the specified location Cause  Creating a keystore at a location where there is already a keystore exists Solution To solve the problem, use a different location to create a keystore (use ENCRYPTION_WALLET_LOCATION in sqlnet.ora file to specify the keystore location), or move this ewallet.p12 file to some other location. Note: Oracle does not recommend deleting keystore file (ewallet.p12) that belongs to a database. If you have multiple keystores, you can choose to merge them rather than deleting either of them.

ORA-65104: operation not allowed on an inactive pluggable database alter pluggable database open

<<Back to DB Administration Main Page ORA-65104: operation not allowed on an inactive pluggable database SQL> alter pluggable database TEST_CLON open; alter pluggable database TEST_CLON open * ERROR at line 1: ORA-65104: operation not allowed on an inactive pluggable database Cause The pluggable database status was UNUSABLE. It was still being created or there was an error during the create operation. A PDB can only be opened if it is successfully created and its status is marked as NEW in cdb_pdbs.status column SQL> select PDB_NAME,STATUS from cdb_pdbs; PDB_NAME             STATUS -------------------- --------------------------- PDB$SEED             NORMAL TEST_CLON            UNUSABLE Solution:  Drop the PDB and create it again. Related Posts How to Clone Oracle PDB (Pluggable Database) with in the Same Container

How to Switch Log File from All Instances in RAC

<<Back to Oracle RAC Main Page Switch The Log File of All Instances in Oracle RAC. In many cases you need to switch the logfile of the database. You can switch logfile using alter system switch logfile command but if you want to switch the logfile from all the instances you need to execute the command on all the instances individually and therefore you must login on all the instances. You can avoid this and switch logfile of all instances by just running the below command from any of the instance in RAC database SQL> ALTER SYSTEM SWITCH ALL LOGFILE;   System altered.

Starting RMAN and connecting to Database

  <<Back to Oracle Backup & Recovery Main Page Starting RMAN and connecting to Database Starting RMAN and connecting to Database To start RMAN you need to set the environment and type rman and press enter. You can connect to database either using connect command or using command line option. using command line option localhost:$ export ORACLE_HOME=/ora_app/product/18c/dbd2 localhost:$ export PATH=$ORACLE_HOME/bin:$PATH localhost:$ export ORACLE_SID=ORCL1P localhost:$ rman target / Recovery Manager: Release 18.0.0.0.0 - Production on Sun Apr 4 08:11:01 2021 Version 18.11.0.0.0 Copyright (c) 1982, 2018, Oracle and/or its affiliates.  All rights reserved. connected to target database: ORCL1P (DBID=4215484517) RMAN> using connect option localhost:$ rman RMAN> connect target sys@ORCL1P  target database Password:******** connected to target database: ORCL1P (DBID=4215484517) NOTE: To use connect command you need to ensure that  you have proper TNS sentry...

ORA-46655: no valid keys in the file from which keys are to be imported

<<Back to DB Administration Main Page SQL> administer key management import encryption keys with secret "xxxx" from '/tmp/pdb02_tde_key.exp' force keystore identified by "xxxx" with backup; administer key management import encryption keys with secret "xxxxxx" from '/tmp/pdb02_tde_key.exp' force keystore identified by "xxxxxx" with backup * ERROR at line 1: ORA-46655: no valid keys in the file from which keys are to be imported Cause: Either the keys to be imported already present in the target database or correct container (PDB) is not set. Solution: In my case I got the error because I attempted to import the keys for newly plugged database PDB02 from CDB$ROOT container. To Solve the issue just switched to the correct container and re run the import. SQL> show con_name CON_NAME ------------------------------ CDB$ROOT <===Wrong Container selected  SQL> alter session set container=PDB02; Session alt...

How to Modify Database Startup Mode (Startoption) in Clusterware Registry

<<Back to Oracle RAC Main Page How to Modify Database Startup Option Using srvctl In this post I will show you, how you can modify the database startup option / Database startup mode in clusterware registry using srvctl. There are some time requirement to change the startup option (from default  OPEN ) to  MOUNT, or "READ ONLY" eg if in case of Physical Standby Configuration Let us Check the Current Configuration $ srvctl config database -d ORCL Database unique name: ORCL Database name: Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_2 Oracle user: oracle Spfile: +DATA/ORCL/PARAMETERFILE/spfileORCL.ora Password file: +DATA/ORCL/PASSWORD/pwORCL Domain: Start options: open Stop options: immediate Database role: PHYSICAL_STANDBY Management policy: AUTOMATIC Server pools: Disk Groups: DATA,RECO Mount point paths: Services: Type: RAC Start concurrency: Stop concurrency: OSDBA group: oinstall OSOPER group: oinstall Database instances: ORCL1,...

How to Attach to a Datapump Job and Check Status of Export or Import

<<Back to Oracle DATAPUMP Main Page How to check the progress of  export or import Jobs You can attach to the export/import  job using ATTACH parameter of oracle datapump utility. Once you are attached to the job you check its status by typing STATUS command. Let us see how Step1>  Find the Export/Import Job Name You can find the datapump job information from  DBA_DATAPUMP_JOBS or  USER_DATAPUMP_JOBS view. SQL> SELECT OWNER_NAME,JOB_NAME,OPERATION,JOB_MODE,STATE from DBA_DATAPUMP_JOBS; OWNER_NAME JOB_NAME                       OPERATION            JOB_MODE   STATE ---------- ------------------------------ -------------------- ---------- ---------- SYSTEM     SYS_EXPORT_FULL_02          ...