Skip to main content

Understanding Indepth of Cassandra Gossip Protocol

<<Back to Cassandra Main Page

What is Gossip Protocol.

A peer-to-peer communication protocol to discover and share location and state information about the other nodes in a Cassandra cluster.

Understanding Indepth of Cassandra Gossip

The gossiper is responsible for making sure every node in the system eventually knows important information about every other node's state, including those that are unreachable or not yet in the cluster when any given state change occurs. Gossip timer task runs every second. This means that a node initiates gossip exchange with one to three nodes every second (round) or ( zero if it is alone in the cluster). Information to gossip is wrapped in an ApplicationState object in a key/value pair. The information exchanged in a GossipMessage looks something like below.

HeartBeatState
Each node in Cassandra has one HeartBeatState associated with it which Consists of generation and version number.
Generation grows at every time the node is started.
Version number is shared with application states and guarantees ordering.
ApplicationState
Consists of state and version number (Its the same version number present in HeartBeatState) and represents a state of single "component" or "element" within Cassandra.
For instance application state for "load information" could be (5.2, 45), which means that node load is 5.2 at version 45.
EndPointState
Includes all ApplicationStates and HeartBeatState for certain endpoint (node). EndPointState can include only one of each type of ApplicationState, so if EndPointState already includes, say, load information, new load information will overwrite the old one. ApplicationState version number guarantees that old value will not overwrite the new one.
EndPointStateMap
Internal structure in Gossiper that has EndPointState for all nodes (including itself) that it has heard about.

The Gossip messaging takes place in a three-way handshake format.


GossipDigestSynMessage:
HostA sends out GossipeDigestSynMessage every second. It contains the largest version of local states.
HostB receives this message, it sort and compare the versions with local ones. Then it knows which local states are out of date, which ones are newer than remote status (HostA). Then it build following message, tells HostA these information:
GossipDigestAckMessage:
Basically digest_list contains status that HostB need from HostA. state_map contains status information that NodeA need. In this example, HostB doesn't have the latest info from HostA, HostA doesn't have the latest info from HostB, HostC status are up to date for both hosts.
When HostA gets the message, it apply the state_map to local states. And then build following message with the list from digest_list.
GossipDigestAck2Message
HostB receives the message, and simply update local state with the information from state_map.

so far so good

well So how does a new node get the idea of whom to start gossiping with? Well, Cassandra has many seed provider implementations that provide a list of seed addresses to the new node and starts gossiping with one of them right away. After its first round of gossip, it will now possess cluster membership information about all the other nodes in the cluster and can then gossip with the rest of them.

Cassandra Failure detection


The gossip process tracks state from other nodes both directly (nodes gossiping directly to it) and indirectly (nodes communicated about secondhand, third-hand, and so on). Rather than have a fixed threshold for marking failing nodes, Cassandra uses an accrual detection mechanism to calculate a per-node threshold that takes into account network performance, workload, and historical conditions. During gossip exchanges, every node maintains a sliding window of inter-arrival times of gossip messages from other nodes in the cluster. Configuring the phi_convict_threshold property adjusts the sensitivity of the failure detector. Lower values increase the likelihood that an unresponsive node will be marked as down.

As shown in Figure above HostA did not respond with in time interval of  mean() * ln(10) * phi_convict_threshold the host will be marked as dead.
Default phi_convict_threshold is 8, ln(10) x 8 = 2.3 x 8 = 18.4. Which means even in best case, the failure detection delay is 18seconds, typically it's about 20 seconds 18.4 x 1.1 = 20.3.



Comments

Popular posts from this blog

DataPump Import Of Object Types Fails With Errors ORA-39083 ORA-2304 Or ORA-39117 ORA-39779

<<Back to Oracle DATAPUMP Main Page ORA-39083: Object type TYPE:"TEST_QA01"."LOG_RECORD" failed to create with error: ORA-02304: invalid object identifier literal Import: Release 12.1.0.2.0 - Production on Tue May 29 07:59:12 2018 Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security, OLAP, Advanced Analytics and Real Application Testing options Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded Starting "SYSTEM"."SYS_IMPORT_FULL_01":  system/********@TEST_QA parfile=import_TEST.par Processing object type SCHEMA_EXPORT/USER Processing object type SCHEMA_EXPORT/SYSTEM_GRANT Processing object type SCHEMA_EXPORT/ROLE_GRANT Processing object type SCHEMA_EXPORT/DEFAULT_ROLE Pr

ORA-28374: typed master key not found in wallet

<<Back to Oracle DB Security Main Page ORA-46665: master keys not activated for all PDBs during REKEY SQL> ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL ; ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL * ERROR at line 1: ORA-46665: master keys not activated for all PDBs during REKEY I found following in the trace file REKEY: Create Key in PDB 3 resulted in error 46658 *** 2019-02-06T15:27:04.667485+01:00 (CDB$ROOT(1)) REKEY: Activation of Key AdnU5OzNP08Qv1mIyXhP/64AAAAAAAAAAAAAAAAAAAAAAAAAAAAA in PDB 3 resulted in error 28374 REKEY: Keystore needs to be restored from the REKEY backup.Aborting REKEY! Cause: All this hassle started because I accidently deleted the wallet and all wallet backup files too and also forgot the keystore password. There was no way to restore the wallet back. Fortunately in my case the PDB which had encrypted data was supposed to be deco

How to Find VIP of an Oracle RAC Cluster

<<Back to Oracle RAC Main Page How to Find Out VIP of an Oracle RAC Cluster Login clusterware owner (oracle) and execute the below command to find out the VIP hostname used in Oracle RAC $ olsnodes -i node1     node1-vip node2     node2-vip OR $ srvctl config nodeapps -viponly Network 1 exists Subnet IPv4: 10.0.0.0/255.255.0.0/bondeth0, static Subnet IPv6: Ping Targets: Network is enabled Network is individually enabled on nodes: Network is individually disabled on nodes: VIP exists: network number 1, hosting node node1 VIP Name: node1-vip VIP IPv4 Address: 10.0.0.1 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes: VIP exists: network number 1, hosting node node2 VIP Name: node2-vip VIP IPv4 Address: 10.0.0.2 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes:

How to Power On/off Oracle Exadata Machine

<<Back to Exadata Main Page How to Power On/off Oracle Exadata Machine Oracle Exadata machines can be powered on/off either by pressing the power button on front of the server or by logging in to the ILOM interface. Powering on servers using  button on front of the server The power on sequence is as follows. 1. Start Rack, including switches  Note:- Ensure the switches have had power applied for a few minutes to complete power on  configuration before starting Exadata Storage Servers 2.Start Exadata Storage Servers  Note:- Ensure all Exadata Storage Servers complete the boot process before starting the   database servers 3. Start Database Servers Powering On Servers Remotely using ILOM The ILOM can be accessed using the Web console, the command-line interface (CLI), IPMI, or SNMP. For example, to apply power to server dm01cel01 using IPMI, where dm01cel01-ilom is the host name of the ILOM for the server to be powered on, run the

How to Attach to a Datapump Job and Check Status of Export or Import

<<Back to Oracle DATAPUMP Main Page How to check the progress of  export or import Jobs You can attach to the export/import  job using ATTACH parameter of oracle datapump utility. Once you are attached to the job you check its status by typing STATUS command. Let us see how Step1>  Find the Export/Import Job Name You can find the datapump job information from  DBA_DATAPUMP_JOBS or  USER_DATAPUMP_JOBS view. SQL> SELECT OWNER_NAME,JOB_NAME,OPERATION,JOB_MODE,STATE from DBA_DATAPUMP_JOBS; OWNER_NAME JOB_NAME                       OPERATION            JOB_MODE   STATE ---------- ------------------------------ -------------------- ---------- ---------- SYSTEM     SYS_EXPORT_FULL_02             EXPORT               FULL       EXECUTING OR You can also find the job name for export/import in logfile in beginning itself. Step2>Attach to the Job and check status One you get the Export/Import Job Name attach the job and check its status. You can attach or det

How to Create Pfile from Spfile and Vice Versa

<<Back to DB Administration Main Page There are instances when a DBA need to start the database using pfile, for example to trouble an instance startup error or to validate init file post parameter changes etc. In such situations you can create a pfile from spfile and once you are done with your changes you can create spfile from updated/modified pfile to start the database. How to Create Pfile from Spfile As sysdba execute following command  SQL> create pfile='/tmp/initOrcl.ora' from spfile; How to Create SPfile from Pfile As sysdba execute following command  SQL> create spfile from  pfile='/tmp/initOrcl.ora'; You can also create the pfile directly from memory How to Create Pfile from Memory As sysdba execute following command  SQL> create  pfile='/tmp/initOrcl.ora' from memory;

How to export only data or only metadata using expdp

<<Back to Oracle DATAPUMP Main Page CONTENT parameter of expdp let you select whether you want to export only data or only metadata or both Default : ALL Syntax and Description CONTENT=[ALL | DATA_ONLY | METADATA_ONLY] DATA_ONLY exports only table row data; no database object definitions are exported. METADATA_ONLY exports only database object definitions; no table row data is exported. Exporting metadata only  $ cat exp_full_pdb01.par directory=dump dumpfile=EXP_PDB01_FULL%U.dmp logfile=EXP_PDB01_FULL.log full=y CONTENT=METADATA_ONLY $ expdp system@PDB01 parfile=exp_full_pdb01.par Exporting data only directory=dump dumpfile=EXP_PDB01_FULL%U.dmp logfile=EXP_PDB01_FULL.log full=y CONTENT=DATA_ONLY $ expdp system@PDB01 parfile=exp_full_pdb01.par

Step by Step How to Configure Software Keystore/ Oracle Wallet

<<Back to Oracle DB Security Main Page How to Configure a Software Keystore A software keystore is a container that stores the Transparent Data Encryption master encryption key. To configure a software Keystore follow the steps below. Step 1: Set the Keystore Location in the sqlnet.ora File You can store the software keystore (also known as wallet) in file system or in ASM Diskgroup. Does not matter where you want to store the keystore you have modify the sqlnet.ora and make an entry accordingly Make an entry as shown below in $ORACLE_HOME/network/admin/sqlnet.ora file Example1: If Storing the Wallet in ASM ENCRYPTION_WALLET_LOCATION=  (SOURCE=(METHOD=FILE)    (METHOD_DATA=     (DIRECTORY= +DG_TST_DATA/$ORACLE_SID/wallet )    )  )   Example2: If Storing the Wallet in File System ENCRYPTION_WALLET_LOCATION=  (SOURCE=(METHOD=FILE)    (METHOD_DATA=     (DIRECTORY= /u01/dbatst1/admin/wallet/$ORACLE_SID)    )  ) NOTE: Ensure that the path you entered in  DIREC

ORA-15040: diskgroup is incomplete

<<Back to Oracle ASM Main Page ORA-15040: diskgroup is incomplete SQL> startup ORA-00099: warning: no parameter file specified for ASM instance ASM instance started Total System Global Area 1140850688 bytes Fixed Size                  8629704 bytes Variable Size            1107055160 bytes ASM Cache                  25165824 bytes ORA-15110: no diskgroups mounted Reason: The reason of this error is simply the ASM is not able to find the some or all the disks. Solution: Investigate and make all the disks available to ASM to mount the disk group. Make sure the disks has proper permissions. If you are using AFD check following services are online oracleacfs oracleadvm oracleoks  oracleafd   Source of Problem : Issue started after restart of the server After restarting the server when I tried to start the ASM instance its started throwing error.  ORA-15110: no diskgroups mounted Investigation in my Case Step1> ASM Logfile Scanning  Looked i

ORA-28365: wallet is not open while starting the database

<<Back to DB Administration Main Page ORA-28365: wallet is not open Encountered while Starting the Database $ srvctl start instance -d CDB001 -i CDB0011 PRCR-1013 : Failed to start resource ora.cdb001.db PRCR-1064 : Failed to start resource ora.cdb001.db on node node1.oracle.com CRS-5017: The resource action "ora.cdb001.db start" encountered the following error: ORA-28365: wallet is not open . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/node1.oracle.com/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.cdb001.db' on 'node1.oracle.com' failed Solution : Start the instance in mount mode SQL> startup mount; ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size                  2926472 bytes Variable Size            1392511096 bytes Database Buffers          738197504 bytes Redo Buffers               13848576 bytes Database mounted. Check Wallet status set linesiz