In a previous post we have looked at some of the new features for the Data Guard Broker Observer in 19c like the ability to run the Observer in OBSERVE ONLY mode as well as some other features like multiple Observers and multiple failover target standby databases.
In this note we look at how Master Observer failover happens and also an important finding which can be a cause for concern.
If we host the Master Observer in the STANDBY data center and we lose both the Master Observer as well as the Data Guard standby database, we have seen the case where FSFO causes the Primary database to also shut down!
Not very good – kindly test the same scenario in your environment and let me know if this is really what you experience as well
####################################################################################################### LISTENER STOPPED AND MASTER OBSERVER LOSES CONTACT WITH PRIMARY ####################################################################################################### [root@rac01 dbs]# su - grid Last login: Mon Jun 15 14:19:20 AWST 2020 [grid@rac01 ~]$ lsnrctl stop DGMGRL> connect /@oradb1 Unable to connect to database using oradb1 ORA-12541: TNS:no listener Failed. Warning: You are no longer connected to ORACLE. DGMGRL> connect / as sysdg Connected to "oradb1" Connected as SYSDG. ####################################################################################################### OBSERVER STILL SHOWS NO ISSUE ####################################################################################################### DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxPerformance Members: oradb1 - Primary database oradb1_sb - (*) Physical standby database clouddb - Physical standby database Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 49 seconds ago) DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 0 seconds ago Last Ping to Target: 3 seconds ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 2 seconds ago Last Ping to Target: 2 seconds ago ####################################################################################################### NOW OBSERVER CANNOT CONNECT AFTER OBSERVERRECONNECT VALUE IS CHANGED ####################################################################################################### DGMGRL> edit configuration set property observerreconnect=30; Property "observerreconnect" updated DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 6 seconds ago Last Ping to Target: 3 seconds ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 5 seconds ago Last Ping to Target: 2 seconds ago ####################################################################################################### Try again after 30 seconds expired ..... MASTER OBSERVER HAS CHANGED ####################################################################################################### DGMGRL> / Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac02" - Master Host Name: rac02.localdomain Last Ping to Primary: (unknown) Last Ping to Target: (unknown) Observer "rac01" - Backup Host Name: rac01.localdomain Last Ping to Primary: 40 seconds ago Last Ping to Target: 1 second ago ####################################################################################################### START THE LISTENER ####################################################################################################### [grid@rac01 ~]$ lsnrctl start ####################################################################################################### OBSERVER CAN NOW PING THE PRIMARY AND MASTER OBSERVER BACK TO ORIGINAL ####################################################################################################### DGMGRL> / Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 1 second ago Last Ping to Target: 1 second ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 2 seconds ago Last Ping to Target: 0 seconds ago ####################################################################################################### SHUTDOWN STANDBY ORADB1_SB ####################################################################################################### [oracle@rac02 ~]$ sqlplus /@oradb1_sb as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 15 14:38:57 2020 Version 19.6.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.6.0.0.0 SQL> shutdown abort ORACLE instance shut down. ERROR: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor Warning: You are no longer connected to ORACLE. ####################################################################################################### ACTIVE TARGET STANDBY CHANGED TO CLOUDDB ####################################################################################################### DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1 Active Target: clouddb Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 3 seconds ago Last Ping to Target: 3 seconds ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 3 seconds ago Last Ping to Target: 2 seconds ago DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxPerformance Members: oradb1 - Primary database clouddb - (*) Physical standby database oradb1_sb - Physical standby database Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 60 seconds ago) ####################################################################################################### SHUTDOWN ACTIVE TARGET STANDBY CLOUDDB ####################################################################################################### [oracle@rac02 ~]$ . oraenv ORACLE_SID = [oradb1sb] ? clouddb The Oracle base remains unchanged with value /u01/app/oracle [oracle@rac02 ~]$ sqlplus sys as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 15 14:50:54 2020 Version 19.6.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Enter password: Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.6.0.0.0 SQL> shutdown abort; ORACLE instance shut down. SQL> ####################################################################################################### ACTIVE TARGET STANDBY DATABASE IS CHANGED TO ORADB1_SB ####################################################################################################### DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 1 second ago Last Ping to Target: 4 seconds ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 0 seconds ago Last Ping to Target: 4 seconds ago ####################################################################################################### STOP LISTENER ON RAC01 AND RAC02 ####################################################################################################### [grid@rac01 ~]$ lsnrctl stop [grid@rac02 ~]$ lsnrctl stop DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxPerformance Members: oradb1 - Primary database Error: ORA-16778: redo transport error for one or more members oradb1_sb - (*) Physical standby database clouddb - Physical standby database Error: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: ERROR (status updated 52 seconds ago) DGMGRL> show fast_start failover Fast-Start Failover: Enabled in Potential Data Loss Mode Protection Mode: MaxPerformance Lag Limit: 30 seconds Threshold: 30 seconds Active Target: oradb1_sb Potential Targets: "oradb1_sb,clouddb" oradb1_sb valid clouddb valid Observers: (*) rac02 rac01 Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: 30 seconds Observer Override: FALSE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Write Errors YES Oracle Error Conditions: (none) DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1 Active Target: oradb1_sb Observer "rac02" - Master Host Name: rac02.localdomain Last Ping to Primary: (unknown) Last Ping to Target: (unknown) Observer "rac01" - Backup Host Name: rac01.localdomain Last Ping to Primary: 119 seconds ago Last Ping to Target: 3 seconds ago ####################################################################################################### SHUTDOWN ABORT PRIMARY ####################################################################################################### DGMGRL> quit [oracle@rac01 trace]$ sqlplus sys as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 15 15:02:55 2020 Version 19.6.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Enter password: Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.6.0.0.0 SQL> shutdown abort ORACLE instance shut down. ####################################################################################################### FSFO HAS HAPPENED PRIMARY DATABASE IS NOW ORADB1_SB ####################################################################################################### [oracle@rac02 ~]$ . oraenv ORACLE_SID = [clouddb] ? oradb1sb The Oracle base remains unchanged with value /u01/app/oracle [oracle@rac02 ~]$ dgmgrl DGMGRL for Linux: Release 19.0.0.0.0 - Production on Mon Jun 15 15:04:26 2020 Version 19.6.0.0.0 Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved. Welcome to DGMGRL, type "help" for information. DGMGRL> connect / as sysdg Connected to "ORADB1_SB" Connected as SYSDG. DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxPerformance Members: oradb1_sb - Primary database oradb1 - (*) Physical standby database (disabled) ORA-16661: the standby database needs to be reinstated clouddb - Physical standby database (disabled) ORA-16661: the standby database needs to be reinstated Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 120 seconds ago) DGMGRL> ####################################################################################################### TRY TO START ORIGINAL PRIMARY ORADB1 ####################################################################################################### SQL> startup ORACLE instance started. Total System Global Area 1476391088 bytes Fixed Size 8896688 bytes Variable Size 1375731712 bytes Database Buffers 83886080 bytes Redo Buffers 7876608 bytes Database mounted. ORA-16649: possible failover to another database prevents this database from being opened ####################################################################################################### NEW PRIMARY IS ORADB1_SB ####################################################################################################### [oracle@rac02 ~]$ . oraenv ORACLE_SID = [racdb2] ? oradb1sb The Oracle base has been set to /u01/app/oracle [oracle@rac02 ~]$ sqlplus sys as sysdba SQL*Plus: Release 19.0.0.0.0 - Production on Mon Jun 15 15:10:49 2020 Version 19.6.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Enter password: Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.6.0.0.0 SQL> select database_role,open_mode from v$database; DATABASE_ROLE OPEN_MODE ---------------- -------------------- PRIMARY READ WRITE ####################################################################################################### ONCE LISTENER COMES UP ON RAC01 AND RAC02 OLD PRIMARY AND SECOND STANDBY ARE REINSTATED ####################################################################################################### ORADB1 SQL> select open_mode,database_role from v$database; OPEN_MODE DATABASE_ROLE -------------------- ---------------- MOUNTED PRIMARY SQL> / OPEN_MODE DATABASE_ROLE -------------------- ---------------- MOUNTED PRIMARY SQL> / OPEN_MODE DATABASE_ROLE -------------------- ---------------- READ ONLY WITH APPLY PHYSICAL STANDBY DGMGRL> connect /@oradb1_sb Connected to "ORADB1_SB" Connected as SYSDBA. DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxPerformance Members: oradb1_sb - Primary database oradb1 - (*) Physical standby database clouddb - Physical standby database Fast-Start Failover: Enabled in Potential Data Loss Mode Configuration Status: SUCCESS (status updated 59 seconds ago) ####################################################################################################### CHANGE MASTER OBSERVER SITE ####################################################################################################### DGMGRL> SET MASTEROBSERVER TO rac01; Succeeded. DGMGRL> show observer Configuration - oradb1_dg Primary: oradb1_sb Active Target: oradb1 Observer "rac01" - Master Host Name: rac01.localdomain Last Ping to Primary: 1 second ago Last Ping to Target: 2 seconds ago Observer "rac02" - Backup Host Name: rac02.localdomain Last Ping to Primary: 1 second ago Last Ping to Target: 1 second ago ####################################################################################################### CHANGE TO MAXAVAILABILITY ####################################################################################################### DGMGRL> edit configuration set protection mode maxavailability; Error: ORA-16654: fast-start failover is enabled Failed. DGMGRL> disable fast_start failover; Disabled. DGMGRL> edit configuration set protection mode maxavailability; Error: ORA-16627: operation disallowed since no member would remain to support protection mode Failed. DGMGRL> edit database oradb1 set property LogXptMode='SYNC'; Property "logxptmode" updated DGMGRL> edit database oradb1_sb set property LogXptMode='SYNC'; Property "logxptmode" updated DGMGRL> edit database clouddb set property LogXptMode='SYNC'; Property "logxptmode" updated DGMGRL> edit configuration set protection mode maxavailability; Succeeded. DGMGRL> DGMGRL> enable fast_start failover Enabled in Zero Data Loss Mode. DGMGRL> show configuration Configuration - oradb1_dg Protection Mode: MaxAvailability Members: oradb1_sb - Primary database oradb1 - (*) Physical standby database clouddb - Physical standby database Fast-Start Failover: Enabled in Zero Data Loss Mode Configuration Status: SUCCESS (status updated 47 seconds ago) ####################################################################################################### STOP RAC01 - MASTER OBSERVER IS RUNNING HERE ####################################################################################################### ####################################################################################################### NEW PRIMARY ON RAC02 ALSO SHUTDOWN!!!!!!!!!!!!!!!!!!!!!!!!!!! ####################################################################################################### 2020-06-15T15:29:43.910963+08:00 DMON: FSFP network call timeout. Killing process FSFP. 2020-06-15T15:29:43.913016+08:00 Process termination requested for pid 29389 [source = rdbms], [info = 2] [request issued by pid: 17941, uid: 54321] 2020-06-15T15:29:58.915933+08:00 Starting background process FSFP 2020-06-15T15:29:59.157480+08:00 FSFP started with pid=88, OS id=30541 2020-06-15T15:30:00.919770+08:00 Primary has heard from neither observer nor target standby within FastStartFailoverThreshold seconds. It is likely an automatic failover has already occurred. Primary is shutting down. 2020-06-15T15:30:00.952957+08:00 Errors in file /u01/app/oracle/diag/rdbms/oradb1_sb/oradb1sb/trace/oradb1sb_lgwr_17908.trc: ORA-16830: primary isolated from fast-start failover partners longer than FastStartFailoverThreshold seconds: shutting down 2020-06-15T15:30:01.674507+08:00 System state dump requested by (instance=1, osid=17908 (LGWR)), summary=[abnormal instance termination]. error - 'Instance is terminating. ' System State dumped to trace file /u01/app/oracle/diag/rdbms/oradb1_sb/oradb1sb/trace/oradb1sb_diag_17863.trc LGWR (ospid: 17908): terminating the instance due to ORA error 16830 2020-06-15T15:30:08.357552+08:00 Instance terminated by LGWR, pid = 17908 ~