Skip to main content

Understanding the Oracle Database Technical Architecture in Detail continued...


<<Back to DB Administration Main Page

Database Redo Logs and Redo Log Buffer Pool In Detail

This post is simply the continuation of  Oracle Database Architecture. In this post I will try to explore about Redo Logs. What are Redo Logs? Why are Redo Logs so Important? What Purpose they full fill? How oracle manages them etc etc....
If you have already gone through my previous Posts related to oracle database architecture you are good to proceed. It would be worth to review these posts in case you missed them. The link are here
Oracle Database Architecture Overview Explain a  high level Architecture 
Database Buffer Cache in Details Explains  how Buffer Cache works and how it is implemented
Database Shared Pool in Detail Explains all you want to know about Shared Pool

So let's get started before wasting any time


What is Redo logs?

Oracle Database records all the changes to the database in the form of Redo logs. Redo Logs resides in a memory area in SGA called Redo Log Buffer and the size of which is controlled using initiation parameter LOG_BUFFER. LGWR is the  parameter which writes the buffers from redo log buffers to online redo log file (a file on disk) to preserve the changes in case of Instance crash.

What are the usages of Redo Logs?

Oracle usages redo logs for many things like.
Recovery (instance and media)
Log Miner
Oracle Streams
There are exciting features like Dataguard would not have been possible without Redo logs

Redo Log Buffer

The redo log buffer is a circular buffer in the SGA that hold the change vectors (redo entries) describing changes made to the database. When a request to change a piece of data in the database is triggered oracle performs following action (most probably in this sequence)
  • The Server process create the change vector for the undo record in PGA.
  • The Server process create the change vector for the data block in PGA.
  • The Server process then combines the change vectors to prepare the redo record to write into the redo log buffer.
  • The process then search the free space in database buffer cache using redo allocation latch.
  • Once the session allocates space in the log buffer it acquires the redo copy latch to write the redo entries in this free space.
  • The undo entries are then written to undo block.
  • And finally the change to the actual data block happens in the database buffer cache.
  • The LGWR process then writes the content of log buffer to online redo log files using  redo writing latch to preserve the changes.
NOTE:- To keep the things simpler and understandable in this whole process mentioned above, it is assumed that every task is completed without any interference . Assume what will happen if there is no free space available in Redo Log buffer Cache or LGWR process is busy writing the redo buffer to online logfile while the session is posting write signal to LGWR. We will come to that in a while.

When LGWR process Writes The Changes From Redo Buffer to Online Log file?

The LGWR process sequentially writes the changes from Redo log buffer to Online Logfile when any of the following condition is met 
  • A user commits/rollback a transaction
  • An online redo log switch occurs
  • Every 3 Seconds
  • The redo log buffer is one-third full or contains 1 MB of buffered data
  • DBWR must write modified buffers to disk
  • Alter System Check Point command is issued.
I assume, by now you have fair idea about, Redo logs, Log buffer and LGWR and you are ready to make it complex.
As you just red "The redo log buffer is a circular buffer in the SGA". The below picture tries to visualize this how oracle manages pointers  to mark the start of free space , end of free space, used space and the space in Log buffer for which LGWR is writing the content to online logfile
As you can see there are 3 important markers. one identifying the start of free space , other identifying the end of free space. The third location is a flag showing whether or not LGWR is busy writing. Once LGWR will complete its write it will move the end of free space pointer to notify the more free space in buffer and will continue writing more recently used buffers. Other sessions will be generating more redo filling up the space in buffer moving the start of free space pointer forward reaching up to the end of the buffer and then cycle back to the beginning of the buffer.

Now Let us Examine Case By Case

How LGWR knows there are more than 1MB redo when it is sleeping?
As we know now that LGWR writes in every 3 Seconds, It means LGWR flush the buffer to online logfile, sleeps for 3 seconds, wakes up, performs the redo write and sleeps and repeat and repeat... The question is here how LGWR knows that there are more than 1MB redo when it is sleeping. The answer is simple, each session checks the total allocated space in redo log buffer every time it allocates space in buffer, if the total space used exceeds 1MB (space between start of free space and end of free space) it posts message to LGWR which interrupts LGWR sleep and LGWR starts writing.

What if the LGWR was not sleeping but writing when the session was  requesting write?

The next question arise, What if the LGWR was not sleeping but writing at the moment session identified that there are more than 1MB redo. Does it still post message to LGWR. And the answer of this question is no. Remember the write flag which LGWR sets when it starts writing, any session which wants to send the signal to LGWR checks this flag to see if LGWR is already writing. LGWR clears the flag once it finishes the writing. LGWR gets the redo writing latch to set and clear the flag.(which means two gets for every write),and each session has to get the latch to read the flag so that it doesn’t read the flag while it’s in flux. In both cases the gets on the latch are in willing-to-wait mode.

What happens with the session which was requesting the LGWR to write but LGWR was busy?

When a session posts a message to LGWR for writing the log it goes on "log file sync" wait event and sets a sleep timer, When it wakes up it checks to see if its write request has been satisfied before continuing. Alternatively if LGWR finishes writing while the session is still sleeping it checks the sessions waiting on "log file sync" and notifies them to continue.



How a session acquires space in Redo Log buffer ?

When a session wants to write redo record in log buffer it gets the redo copy latch. (Through redo copy latch it identifies the buffer in log buffer). It then gets the redo allocation latch  After successfully acquiring the redo allocation latch it moves the start of free space pointer and drops the  redo allocation latch. The session then copies the redo record into the log buffer (if sufficient space available in buffer) while holding the redo copy latch. If the allocation has taken the used space over one-third of the log buffer or 1MB, or if the redo record was a commit record, then post signal to LGWR to write (but get the redo writing latch first to check if LGWR is already writing) If the redo record was a commit record, increment redo synch writes and wait on log file sync wait.

What if sufficient space in Redo log buffer is not available?

After acquiring the redo copy and redo allocation latch if the session identifies that there is no sufficient free space available in redo log buffer, it drops the redo allocation latch and acquires the redo writing latch  to check if LGWR is writing, if LGWR is not writing post a signal to write else drop the redo writing latch and wait on log buffer space wait event. When LGWR finish the writing it checks the sessions waiting on  log buffer space wait event and post them signal to proceed.

NOTE: As we can see after finishing the writing LGWR checks both, sessions waiting on  "log buffer space" wait event sessions waiting on "log file sync" and notifies them to continue.

How LGWR writes redo from Log buffer to Disk

LGWR first acquires redo writing latch and set the write flag, so that each coming session knows that LGWR is busy writing and do not post further message to LGWR, then it drops the redo writing latch and acquired the redo allocation latch to identify the highest point allocated in the log buffer at that moment. It then moves the pointer up to the end of the block and drops the latch, and starts copying the log buffer to disk.
As LGWR completes its write, it clears the write flag (getting and releasing the redo writing latch again) and move the "End of free space" pointer up to the point it has just finished writing (getting and releasing the redo allocation latch ), and run through the list of sessions waiting on "log file sync" waits and  "log buffer space" wait events, signaling them to continue.
If LGWR notice that there are some sessions waiting on log file sync that were not cleared by its write, it goes through the cycle again..

Multiple Public Redo Log Buffers and Private strands

Since now you know how redo copy, redo allocation and redo write latches are used, assume a very busy system where there are multiple sessions congruently generating lots of redo and competing  for redo allocation/redo write latches will kill the system performance. To over come this situation oracle implemented multiple log buffers  starting from  10g.(never get a change to try out at 9i)
A hidden parameter _log_parallelism_dynamic (default value TRUE) controls the behavior whether multiple redo log buffers will be configured or not, another hidden parameter _log_parallelism_max controls the maximum number of log buffer strands. Value of _log_parallelism_max is automatically calculated by oracle based on number of CPUs, in my observation. I found value 2 on server with CPU 4,8 and 16 and value 4 on server having 64 and 72 CPUs. I am not sure but I guess oracle drives the value of this parameter using formula strand=CPUs/16.
You can verify the concept of multiple log buffer using below query
SQL> select strand_size_kcrfa from x$kcrfstrand where ptr_kcrf_pvt_strand = hextoraw(0) and pnext_buf_kcrfa_cln != hextoraw(0);
STRAND_SIZE_KCRFA
-----------------
          6660096
          6660096
If you sum up its 13008K which is the size of log_buffer in my case
SQL> show parameter log_buffer
log_buffer                           big integer     13008K
Another way to verify the machenism of having multiple log buffer, by performing SGA dump.
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dumpvar sga kcrf_max_strands
uword kcrf_max_strands_ [0600283C8, 0600283CC) = 00000002
SQL> oradebug dumpvar sga kcrf_actv_strands
uword kcrf_actv_strands_ [0600283E0, 0600283E4) = 00000001

From the dump above you can see the maximum number of public redo strands are 2 and active one is just 1. This is because my database is not very busy. After little investigation I come across the Metalink Note 372557.1 which states 
"The initial allocation for the number of strands depends on the number of CPU's and is started with 2 strands with one strand for active redo generation. For large scale enterprise systems the amount of redo generation is large and hence these strands are *made active* as and when the foregrounds encounter this redo contention (allocated latch related contention) when this concept of dynamic strands comes into play."
SQL> select count(*) from v$latch_children where name = 'redo allocation';
   COUNT(*)
----------
         2
1 row selected.


Private strands or Private Redo


Recall the redo generation behavior discussed in the beginning of the post itself where redo is prepared in PGA, and then copied into the log bugger using redo copy, redo allocation and redo writing latches as we have studied so far. If oracle can some how reduce this overhead and copy the redo directly from this private place to the logfile, will optimize the redo log buffer usages and enhancement in the performance . To achieve this oracle introduced another enhancement from 10g, known as private redo or Private strands for this oracle allocated space for redo generation directly in SGA (shared pool).


The private strands SGA buffers are allocated at startup time itself. The first point to note is that the number of threads in use, both private and public, is dynamic and Oracle tries to keep the number to a minimum.
The second point is that the basic handling of the public threads doesn’t change. When a session wants to copy something into a public thread, the method is just the same as always.

The maximum private redo strands is defined as transactions/10 and is controlled by hidden parameter _log_private_parallelism_mul. The session tries to find the private thread and if there are no private threads available, the session will use one of the public threads. Each private thread has its own redo allocation latch
Using query below you can find the size allocated to private strand
SQL> select pool, name, bytes from v$sgastat where name like 'private strands'

Now with the introduction of  private strands allocated in shared pool the processes prepare the redo log in shared pool itself instead of PGA . And LGWR process directly writes them to Online logfile skipping the Log buffer.
With the implementation of this feature the first image provided in this blog will changes slightly and look somewhat like this.


 
Misc Information Related to Redo Log Buffer and LGWR Process

Related Events 

log file switch completion
log file sync
log file sequential read
log file sync: SCN ordering
target log write size
log file parallel write
log switch/archive
log file single write
switch logfile command
LGWR wait for redo copy

Related Latches

latch: redo writing
latch: redo copy
latch: redo allocation

Related Hidden Parameters

Name         Value  Description
_log_checkpoint_recovery_check           0           # redo blocks to verify after checkpoint
_log_event_queues                        0           number of the log writer event queues
_log_switch_timeout                      0           Maximum number of seconds redos in the current log could span
_log_buffers_debug                       FALSE       debug redo buffers (slows things down)
_log_buffers_corrupt                     FALSE       corrupt redo buffers before write
_log_simultaneous_copies                 144         number of simultaneous copies into redo buffer(# of copy latches)
_log_parallelism_max                     4           Maximum number of log buffer strands
_log_parallelism_dynamic                 TRUE        Enable dynamic strands
_log_private_parallelism_mul             10          Active sessions multiplier to deduce number of private strands
_log_private_mul                         5           Private strand multiplier for log space preallocation
_log_read_buffer_size                    8           buffer size for reading log files
_log_buffer_coalesce                     FALSE       Coalescing log buffers for log writes
_log_file_sync_timeout                   10          Log file sync timeout (centiseconds)
_log_write_info_size                     4096        Size of log write info array                                                
_log_max_optimize_threads                128         maximum number of threads to which log scan optimization is applied
_log_read_buffers                        8           Number of log read buffers for media recovery
_log_committime_block_cleanout           TRUE        Log commit-time block cleanout
_log_space_errors                        TRUE        should we report space errors to alert log
_log_segment_dump_parameter              TRUE        Dump KSP on Log Segmentation
_log_segment_dump_patch                  TRUE        Dump Patchinfo on Log Segmentation
_redo_compatibility_check                FALSE       general and redo/undo compatibility sanity check
_redo_log_record_life                    168         Life time in hours for redo log table records
_redo_log_debug_config                   0           Various configuration flags for debugging redo logs
_redo_log_check_backup                   10          time interval in minutes between wakeups to check backup of redo logs
_redo_read_from_memory                   TRUE        Enable reading redo from in-memory log buffer



Comments

Popular posts from this blog

DataPump Import Of Object Types Fails With Errors ORA-39083 ORA-2304 Or ORA-39117 ORA-39779

<<Back to Oracle DATAPUMP Main Page ORA-39083: Object type TYPE:"TEST_QA01"."LOG_RECORD" failed to create with error: ORA-02304: invalid object identifier literal Import: Release 12.1.0.2.0 - Production on Tue May 29 07:59:12 2018 Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security, OLAP, Advanced Analytics and Real Application Testing options Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded Starting "SYSTEM"."SYS_IMPORT_FULL_01":  system/********@TEST_QA parfile=import_TEST.par Processing object type SCHEMA_EXPORT/USER Processing object type SCHEMA_EXPORT/SYSTEM_GRANT Processing object type SCHEMA_EXPORT/ROLE_GRANT Processing object type SCHEMA_EXPORT/DEFAULT_ROLE Pr

ORA-28374: typed master key not found in wallet

<<Back to Oracle DB Security Main Page ORA-46665: master keys not activated for all PDBs during REKEY SQL> ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL ; ADMINISTER KEY MANAGEMENT SET KEY FORCE KEYSTORE IDENTIFIED BY xxxx WITH BACKUP CONTAINER = ALL * ERROR at line 1: ORA-46665: master keys not activated for all PDBs during REKEY I found following in the trace file REKEY: Create Key in PDB 3 resulted in error 46658 *** 2019-02-06T15:27:04.667485+01:00 (CDB$ROOT(1)) REKEY: Activation of Key AdnU5OzNP08Qv1mIyXhP/64AAAAAAAAAAAAAAAAAAAAAAAAAAAAA in PDB 3 resulted in error 28374 REKEY: Keystore needs to be restored from the REKEY backup.Aborting REKEY! Cause: All this hassle started because I accidently deleted the wallet and all wallet backup files too and also forgot the keystore password. There was no way to restore the wallet back. Fortunately in my case the PDB which had encrypted data was supposed to be deco

How to Find VIP of an Oracle RAC Cluster

<<Back to Oracle RAC Main Page How to Find Out VIP of an Oracle RAC Cluster Login clusterware owner (oracle) and execute the below command to find out the VIP hostname used in Oracle RAC $ olsnodes -i node1     node1-vip node2     node2-vip OR $ srvctl config nodeapps -viponly Network 1 exists Subnet IPv4: 10.0.0.0/255.255.0.0/bondeth0, static Subnet IPv6: Ping Targets: Network is enabled Network is individually enabled on nodes: Network is individually disabled on nodes: VIP exists: network number 1, hosting node node1 VIP Name: node1-vip VIP IPv4 Address: 10.0.0.1 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes: VIP exists: network number 1, hosting node node2 VIP Name: node2-vip VIP IPv4 Address: 10.0.0.2 VIP IPv6 Address: VIP is enabled. VIP is individually enabled on nodes: VIP is individually disabled on nodes:

How to Power On/off Oracle Exadata Machine

<<Back to Exadata Main Page How to Power On/off Oracle Exadata Machine Oracle Exadata machines can be powered on/off either by pressing the power button on front of the server or by logging in to the ILOM interface. Powering on servers using  button on front of the server The power on sequence is as follows. 1. Start Rack, including switches  Note:- Ensure the switches have had power applied for a few minutes to complete power on  configuration before starting Exadata Storage Servers 2.Start Exadata Storage Servers  Note:- Ensure all Exadata Storage Servers complete the boot process before starting the   database servers 3. Start Database Servers Powering On Servers Remotely using ILOM The ILOM can be accessed using the Web console, the command-line interface (CLI), IPMI, or SNMP. For example, to apply power to server dm01cel01 using IPMI, where dm01cel01-ilom is the host name of the ILOM for the server to be powered on, run the

How to Attach to a Datapump Job and Check Status of Export or Import

<<Back to Oracle DATAPUMP Main Page How to check the progress of  export or import Jobs You can attach to the export/import  job using ATTACH parameter of oracle datapump utility. Once you are attached to the job you check its status by typing STATUS command. Let us see how Step1>  Find the Export/Import Job Name You can find the datapump job information from  DBA_DATAPUMP_JOBS or  USER_DATAPUMP_JOBS view. SQL> SELECT OWNER_NAME,JOB_NAME,OPERATION,JOB_MODE,STATE from DBA_DATAPUMP_JOBS; OWNER_NAME JOB_NAME                       OPERATION            JOB_MODE   STATE ---------- ------------------------------ -------------------- ---------- ---------- SYSTEM     SYS_EXPORT_FULL_02             EXPORT               FULL       EXECUTING OR You can also find the job name for export/import in logfile in beginning itself. Step2>Attach to the Job and check status One you get the Export/Import Job Name attach the job and check its status. You can attach or det

How to Create Pfile from Spfile and Vice Versa

<<Back to DB Administration Main Page There are instances when a DBA need to start the database using pfile, for example to trouble an instance startup error or to validate init file post parameter changes etc. In such situations you can create a pfile from spfile and once you are done with your changes you can create spfile from updated/modified pfile to start the database. How to Create Pfile from Spfile As sysdba execute following command  SQL> create pfile='/tmp/initOrcl.ora' from spfile; How to Create SPfile from Pfile As sysdba execute following command  SQL> create spfile from  pfile='/tmp/initOrcl.ora'; You can also create the pfile directly from memory How to Create Pfile from Memory As sysdba execute following command  SQL> create  pfile='/tmp/initOrcl.ora' from memory;

How to export only data or only metadata using expdp

<<Back to Oracle DATAPUMP Main Page CONTENT parameter of expdp let you select whether you want to export only data or only metadata or both Default : ALL Syntax and Description CONTENT=[ALL | DATA_ONLY | METADATA_ONLY] DATA_ONLY exports only table row data; no database object definitions are exported. METADATA_ONLY exports only database object definitions; no table row data is exported. Exporting metadata only  $ cat exp_full_pdb01.par directory=dump dumpfile=EXP_PDB01_FULL%U.dmp logfile=EXP_PDB01_FULL.log full=y CONTENT=METADATA_ONLY $ expdp system@PDB01 parfile=exp_full_pdb01.par Exporting data only directory=dump dumpfile=EXP_PDB01_FULL%U.dmp logfile=EXP_PDB01_FULL.log full=y CONTENT=DATA_ONLY $ expdp system@PDB01 parfile=exp_full_pdb01.par

Step by Step How to Configure Software Keystore/ Oracle Wallet

<<Back to Oracle DB Security Main Page How to Configure a Software Keystore A software keystore is a container that stores the Transparent Data Encryption master encryption key. To configure a software Keystore follow the steps below. Step 1: Set the Keystore Location in the sqlnet.ora File You can store the software keystore (also known as wallet) in file system or in ASM Diskgroup. Does not matter where you want to store the keystore you have modify the sqlnet.ora and make an entry accordingly Make an entry as shown below in $ORACLE_HOME/network/admin/sqlnet.ora file Example1: If Storing the Wallet in ASM ENCRYPTION_WALLET_LOCATION=  (SOURCE=(METHOD=FILE)    (METHOD_DATA=     (DIRECTORY= +DG_TST_DATA/$ORACLE_SID/wallet )    )  )   Example2: If Storing the Wallet in File System ENCRYPTION_WALLET_LOCATION=  (SOURCE=(METHOD=FILE)    (METHOD_DATA=     (DIRECTORY= /u01/dbatst1/admin/wallet/$ORACLE_SID)    )  ) NOTE: Ensure that the path you entered in  DIREC

ORA-15040: diskgroup is incomplete

<<Back to Oracle ASM Main Page ORA-15040: diskgroup is incomplete SQL> startup ORA-00099: warning: no parameter file specified for ASM instance ASM instance started Total System Global Area 1140850688 bytes Fixed Size                  8629704 bytes Variable Size            1107055160 bytes ASM Cache                  25165824 bytes ORA-15110: no diskgroups mounted Reason: The reason of this error is simply the ASM is not able to find the some or all the disks. Solution: Investigate and make all the disks available to ASM to mount the disk group. Make sure the disks has proper permissions. If you are using AFD check following services are online oracleacfs oracleadvm oracleoks  oracleafd   Source of Problem : Issue started after restart of the server After restarting the server when I tried to start the ASM instance its started throwing error.  ORA-15110: no diskgroups mounted Investigation in my Case Step1> ASM Logfile Scanning  Looked i

ORA-28365: wallet is not open while starting the database

<<Back to DB Administration Main Page ORA-28365: wallet is not open Encountered while Starting the Database $ srvctl start instance -d CDB001 -i CDB0011 PRCR-1013 : Failed to start resource ora.cdb001.db PRCR-1064 : Failed to start resource ora.cdb001.db on node node1.oracle.com CRS-5017: The resource action "ora.cdb001.db start" encountered the following error: ORA-28365: wallet is not open . For details refer to "(:CLSN00107:)" in "/u01/app/oracle/diag/crs/node1.oracle.com/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.cdb001.db' on 'node1.oracle.com' failed Solution : Start the instance in mount mode SQL> startup mount; ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size                  2926472 bytes Variable Size            1392511096 bytes Database Buffers          738197504 bytes Redo Buffers               13848576 bytes Database mounted. Check Wallet status set linesiz