Exadata False Amber LED in the ILOM

<<Back to Exadata Main Page
We got email alert from one dom0 machine in X5-2 Eighth Rack HC 8 TB:

DiskControllerFirmwareRevision Unknown.

Looks like LSI MegaRAID controler info cannot be fetched. domU machines running under this dom0 are still working properly.
#dbmcli -e list physicaldisk
252:0 B7G5GA normal
252:1 B7HPXA failed
252:2 B7J55A failed
252:3 B6RT7A failed
# /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 | grep -i package
]# <<<<<<------- BLANK]# /opt/MegaRAID/MegaCli/MegaCli64 /c0 show termlog Controller = 0
Status = Failure
Description = Controller 0 not found
Solution:Oracle engineer replaced this "suspected" RAID controller
Further Issue: Even after replacing the RAID controller the Amber LED in the ILOM of on Exadata Compute nodes was not cleared.

The solution is documented in Doc ID 2111921.1 How to clear AMBER LED on Exadata Compute nodes when Compute Node disks report offline but no fault detected by Raid controller

at compute node is is still showing failed

#dbmcli -e list physicaldisk252:0 B7G5GA normal
252:1 B7HPXA failed
252:2 B7J55A failed
252:3 B6RT7A failed

In order to clear the failed status in the above dbmcli output I edited the cellinit.ora file as per Doc ID 2111921.1.
# imageinfo -ver
12.1.2.1.2.160617.1
# cd /opt/oracle/dbserver_12.1.2.1.2.160617.1/dbms/deploy/config

Add "_cell_allow_reenable_predfail=true" to cellinit.ora

# echo "_cell_allow_reenable_predfail=true" >> cellinit.ora

# cat cellinit.ora_cell_allow_reenable_predfail=true

Restart ms

#dbmcli -e alter dbserver restart services all

Stopping the RS and MS services...
The SHUTDOWN of services was successful.
Starting the RS and MS services...
Getting the state of RS services... running
Starting MS services...
The STARTUP of MS services was successful

Reenable physicaldisks that are marked as failed

#dbmcli -e alter physicaldisk <pdid> reenable force

# dbmcli -e alter physicaldisk 252:1 reenable force
Physical disk 252:1 was reenabled.
# dbmcli -e alter physicaldisk 252:2 reenable force
Physical disk 252:2 was reenabled.
# dbmcli -e alter physicaldisk 252:3 reenable force
Physical disk 252:3 was reenabled.
Note: To identify the <pdid> and failed disks use # dbmcli -e list physicaldisk
Once all failed disks are reenabled, check the # dbmcli -e list physicaldisk output to confirm all disks are normal:
# dbmcli -e list physicaldisk
252:0 B7G5GA normal
252:1 B7HPXA normal
252:2 B7J55A normal
252:3 B6RT7A normal
# dbmcli -e list alerthistory
........................................................................................................................................................
........................................................................................................................................................
2_3 2018-01-12T12:46:38+01:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1549B7J55A Firmware : A720 Slot Number : 2"
3_1 2017-12-12T17:04:38+01:00 critical "Hard disk failed. Status : FAILED Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1549B6RT7A Firmware : A720 Slot Number : 3"
3_2 2017-12-14T01:00:38+01:00 warning "Hard disk was removed. Status : NOT PRESENT Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1549B6RT7A Firmware : A720 Slot Number : 3"
3_3 2018-01-12T12:46:48+01:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1549B6RT7A Firmware : A720 Slot Number : 3"
=> this is partial o/p. The o/p has been trimmed for better readability purpose

remove the "_cell_allow_reenable_predfail=true" line from cd /opt/oracle/dbserver_12.1.2.3.6.170713/dbms/deploy/config/cellinit.ora

Online DBA

Search This Blog