How to
recover Physical Standby database if it goes out of sync J
(Missing log Archive log flies)
When you are using Data guard, there are several scenarios when physical
standby can go out of sync with the primary database.
Before doing anything to
correct the problem, we need to verify that why standby is not in sync with the
primary. In this particular article, we are covering the scenario where a log
is missing from the standby but apart from the missing log, all logs are
available.
Verify from v$archived_log
that there is a gap in the sequence number. All the logs up to that gap should
have APPLIED=YES and all the sequence# after the missing log sequence# are
APPLIED=NO. This means that due to the missing log, MRP is not applying the
logs on standby but the logs are still being transmitted to the standby and are
available.
SQL>
SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
So for example, if the
missing log sequence# is 400, then the above query should show that up to
sequence#399, all have APPLIED=YES and starting from 401, all are APPLIED=NO.
There are few steps to
be performed when the standby is not in sync with the primary because there is
a gap of logs on standby.
These steps are:
STEP #1: Take an incremental
backup of primary from the SCN where standby is lagging behind and apply on the
standby server
STEP #2: If step#1 is not able to
sync up, then re-create the controlfile of standby database from the primary….
STEP #3: If after step#2, you still find that logs are not being applied on
the standby, check the alert log and you may need to re-register the logs with
the standby .
STEP#1
1. On STANDBY
database query the v$database view and record the current SCN of the
standby database:
SQL>
SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
1.3945E+10
SQL>
SELECT TO_CHAR (CURRENT_SCN) FROM V$DATABASE;
----------------------------------------
13945141914
2. Stop Redo
Apply on the standby database:
SQL> ALTER DATABASE RECOVER MANAGED
STANDBY DATABASE CANCEL;
ALTER DATABASE RECOVER
MANAGED STANDBY DATABASE CANCEL
*
ERROR at line 1:
ORA-16136: Managed
Standby Recovery not active
If you
see the above error, it means Managed Recovery is already off
You can
also confirm from the view v$managed_standby to see if the MRP is
running or not
SQL>
SELECT PROCESS, STATUS FROM V$MANAGED_STANDBY;
3. Connect to
the primary database as the RMAN target and create an incremental backup from
the current SCN of the standby database that was recorded in step 1:
For example,
BACKUP
INCREMENTAL FROM SCN 13945141914 DATABASE FORMAT '/tmp/ForStandby_%U' tag 'FOR
STANDBY'
You can choose a
location other than /tmp also.
4. Do a recovery
of the standby database using the incremental backup of primary taken above:
On the Standby server, without
connecting to recovery catalog, catalog the backupset of the incremental backup
taken above. Before this, of course you need to copy the backup piece of the
incremental backup taken above to a location accessible to standby serve
$ rman
nocatalog target /
RMAN> CATALOG BACKUPPIECE '/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1';
Now in the same session,
start the recovery
RMAN>
RECOVER DATABASE NOREDO;
You should see something
like:
Starting
recover at 2008-09-17 04:59:57
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=309 devtype=DISK
channel ORA_DISK_1: starting incremental datafile backupset
restore
channel ORA_DISK_1: specifying datafile(s) to restore from
backup set
....
..
..
.
channel ORA_DISK_1: reading from backup piece
/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
channel ORA_DISK_1: restored backup piece 1
piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
tag=FOR STANDBY
channel ORA_DISK_1: restore complete, elapsed time: 01:53:08
Finished recover at 2008-07-25 05:20:3
Delete the
backup set from standby:
RMAN>
DELETE BACKUP TAG 'FOR STANDBY';
using channel ORA_DISK_1
List of Backup Pieces
BP Key BS Key Pc# Cp#
Status Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
17713 17713 1
1 AVAILABLE DISK
/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1
Do
you really want to delete the above objects (enter YES or NO)? YE
deleted
backup piece
backup
piece handle=/dump/proddb/inc_bkup/ForStandby_1qjm8jn2_1_1 recid=17713
stamp=660972421
Deleted
1 objects
5.
Try to start the managed recovery.
ALTER
DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM
SESSION;
If you get an error
here, you need to go to STEP#2 for bringing standby in sync.
If no error, then using
the view v$managed_standby, verify that MRP process is started and has
the status APPLYING_LOGS.
6. After this,
check whether the logs are being applied on the standby or not:
SQL>
SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;
After doing a recovery
using the incremental backup, you will not see the sequence#'s which were
visible earlier with APPLIED=NO because they have been absorbed as part of the
incremental backup and applied on standby during recovery.
The APPLIED column
starts showing YES for the logs which are being transmitted now, this means
logs are being applied.
Check the status of MRP
process in the view v$managed_standby. The status should be
APPLYING_LOGS for the duration that available logs are being applied and once
all available logs have been applied, the status should be WAITING_FOR_LOGS
7. Another check
to verify that primary and standby are in sync. Run the following query on both
standby and primary:
SQL>
select max (sequence#) from v$log_history.
Output should be same on both databases.
STEP #2: Steps to recreate the standby controlfile and start the
managed
recover on standby
Since Managed recovery
failed after applying the incremental backup, we need to recreate the
controlfile of standby. The reason for recreating the controlfile is that the
state of the database was same because the database_scn was not updated in the
control file after applying the incremental backup while the scn for datafiles
were updated. Consequently, the standby database was still looking for the old
file to apply.
A good MOSC note for
re-creating the controlfile in such a scenario is 734862.1.
1. Take the
backup of controlfile from primary
rman
target sys/oracle@proddb catalog rman/cat@emrep
rman> backup current controlfile for standby;
2. Copy the
controlfile backup to the standby system (or if it is on the common NFS mount,
no need to transfer or copy) and restore the controlfile onto the standby
database
Shutdown all instances
(If standby is RAC) of the standby.
sqlplus
/ as sysdba
shutdown immediate
exit
Startup nomount, one
instance.
sqlplus / as sysdba
startup nomount
exit
Restore the standby
control file.
rman nocatalog target /
restore standby
controlfile from '/tmp/o1_mf_TAG20070220T151030_.bkp';
exit
3. Startup the
standby with the new control file.
sqlplus / as sysdba
shutdown immediate
startup mount
exit
4. Restart
managed recovery in one instance (if standby is RAC) of the standby database:
sqlplus / as sysdba
SQL > ALTER DATABASE RECOVER MANAGED
STANDBY DATABASE DISCONNECT;
The above statement may
succeed without errors but the MRP process will still not start. The reason is
that since the controlfile has been restored from the primary, it is looking
for datafiles at the same location as are in primary instead of standby. For example,
if the primary datafiles are located at '+DATA/proddb_1/DATAFILE' and standby
datafiles are at '+DATA/proddb_2/DATAFILE', the new controlfile will show the
datafile’s location as '+DATA/proddb_1/DATAFILE'. This can be verified from the
query "select name from v$datafile" on the standby
instance. We need to rename all the datafiles to reflect the correct location.
There are two ways to
rename the datafiles:
1. Without using
RMAN
Change the parameter standby_file_management=manual
in standby’s parameter file.
ALTER
DATABASE RENAME FILE '+DATA/proddb_1/datafile/users.310.620229743' TO
'+DATA/proddb_2/datafile/USERS.1216.648429765';
2. Using RMAN
rman
nocatalog target /
Catalog the files, the
string specified should refer to the diskgroup/filesystem destination of the
standby data files.
RMAN> catalog start with
'+diskgroup//datafile/';
e.g.:
RMAN> catalog start with
'+DATA/proddb_2/datafile/';
This will give the user
a list of files and ask if they should all be cataloged. The user should review
and say YES if all the datafiles are properly listed.
Once that is done, then
commit the changes to the controlfile
RMAN> switch database to copy;
Now start the managed
recovery as:
SQL > ALTER DATABASE RECOVER MANAGED
STANDBY DATABASE DISCONNECT
and check for processes
in the view v$managed_standby. MRP process should be there. It
will also start applying all the archived logs that were missing since last
applied log. This process might take hours.
5. Another check
to verify that primary and standby are in sync:
Run the following query
on both standby and primary after all logs in v$archived_log show
APPLIED=YES:
SQL> select
max(sequence#) from v$log_history.
Output should be same on both databases.
STEP #3 Register all the
redo logs to standby database.
After recreating the
controlfile, you still find that logs are being transmitted but not being
applied on the standby. Check the alert log of standby. For example, see if you
find something similar to below snippet:
Fetching
gap sequence in thread 1, gap sequence 74069-74095
Wed Sep
17 06:45:47 2008
RFS[1]:
Archived Log:
'+DATA/ipwp_sac1/archivelog/2008_09_17/thread_1_seq_74093.259.665649929'
Wed Sep
17 06:45:55 2008
Fetching
gap sequence in thread 1, gap sequence 74069-74092
Wed Sep
17 06:45:57 2008
RFS[1]:
Archived Log:
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74094.258.665649947'
Wed Sep
17 06:46:16 2008
RFS[1]:
Archived Log:
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74095.256.665649957'
Wed Sep
17 06:46:26 2008
FAL[client]:
Failed to request gap sequence
GAP
- thread 1 sequence 74069-74092
The contents of alert
log shows that logs sequence# from 74069 to 74092 may have been transmitted but
not applied. The view v$archived_log shows the sequence# starting from
74093 and APPLIED=NO.
So this situation means
that logs up to 74068 were applied as part of the incremental backup and from
74069 to 74093 have been transferred to standby server but they must have
failed to register with standby database. Try the following steps:
- Locate the log sequence# shown in alert log (for example 74069 to
74092). For example,
+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861
- Register all these archived logs with the standby database.
alter database register logfile '+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74069.995.665630861';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74070.998.665631405';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74071.792.665633755';
alter database register logfile
'+DATA/proddb_2/archivelog/2008_09_17/thread_1_seq_74072.263.665633713';
……..
….and
so on till the last one.
- Now check the view v$archived_log and finally should see
the logs being applied. The status of MRP should change from
ARCHIVE_LOG_GAP to APPLYING_LOGS and eventually WAITING_FOR_LOGS.
One of the major problems
being faced by DBAs managing Physical Standby Databases is to keep the standby
database in sync with Primary Database.
At times, standby database will go out of sync and there will be hundreds of
archive logs to be applied.
It takes huge amount of data transfer and also large amount of archive log
apply activity in the standby database.
Also, there is a potential chance of archive log non-availability, which may
lead to rebuild of standby database.
The above situation can be
tacked, without rebuild in Oracle 10g environment.
Steps to bring the standby database in sync with Primary Database are listed
below.
1. Find
the current SCN of standby database.
select
current_scn from v$database;
CURRENT_SCN
———–
4793543
2. On the
primary database – create the needed incremental backup from the above SCN
rman target /
RMAN>
BACKUP DEVICE TYPE DISK INCREMENTAL FROM SCN 4793543 DATABASE FORMAT
‘C:\temp\bkup_%U’;
3. create
a new standby controlfile from production
sqlplus>
alter database create standby crontrolfile ‘blah’;
4. Cancel
managed recovery on standby
sqlplus> recover managed standby database cancel;
5. Move
your incremental backup from (2) to the standby server (empty folder) and catalog
it
rman target /
rman> catalog start with ‘c:\temp\backup\’;
6. recover
your standby from the incremental backup
rman> recover
database noredo;
7.
shutdown the standby and replace the controlfile with the one you backup in (3)
8. startup
the standby and put it back into managed recovery mode
sqlplus>
startup mount
sqlplus> recover managed standby
database disconnect;
In
this document we will see the steps to failover to a physical standby database.
1. Stop Redo Apply.
Issue the following SQL statement
on the target standby database:
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
2. Finish applying all
received redo data.
By this
we are giving indication to the standby database that primary database is no
more.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH;
Once the FINISH
command has completed, the protection mode of the primary database
is lowered
to Maximum Performance, regardless of its original protection mode.
This is done
because the new primary can be activated without an standby
SQL > select protection mode, protection
level from v$database
Protection Mode Protection Level
——————– ——————–
MAXIMUM PERFORMANCE UNPROTECTED
3.
Verify that the target
standby database is ready to become a
Primary database.
Query the SWITCHOVER_STATUS column of the
V$DATABASE view on the target
Standby database.
SQL
> select SWITCHOVER_STATUS from v$database
4. Switch
the physical standby database to the primary role.
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION
SHUTDOWN;
5. Open the new primary database
SQL> ALTER DATABASE OPEN;
At this stage the
protection level is changed to Max Performance from ‘Unprotected’
6. Backup the new primary database.
J Manual Switchover to Standby and Primary J
On the
PRIMARY Database: (Open stage)
1. Query
V$DATABASE to check the role of Primary Database.
SQL> Select Database_role from v$Database;
It will
return “PRIMARY”
2. Now check the Switchover Status of the Primary Database.
SQL>
Select switchover_status from v$Database;
It will return
“SESSIONS ACTIVE”.
3. Now you are ready to
perform a manual switchover. Execute the following command using
WITH SESSION SHUTDOWN” option.
SQL> Alter Database Commit to Switchover to
Physical Standby with session Shutdown;
Database
Altered.
4 Now your PRIMARY
Database has become Physical Standby. To verify this change,
again query the Database_role column of
V$DATABASE. Now it will return
“PHYSICAL STANDBY”.
SQL > Select database_role from
v$database;
5. Shutdown the database
and again Start it to mount stage.
SQL >Shutdown immediate
SQL > Startup mount;
On the
PHYSICAL STANDBY Database: (Mount stage)
6. Query V$DATABASE to check the role of Standby
Database.
SQL> Select
Database_role from v$Database;
It
will return “PHYSICAL STANDBY”,
8. Now check the
Switchover Status of the Standby Database.
. SQL> Select switchover_status from v$Database;
It will
return “SESSIONS ACTIVE”.
9. Now cancel the MRP
which is running in the background of the Standby Database.
Execute the following command:
SQL> Alter
database Recover Managed Standby Database Cancel;
Database
Altered.
10. Now you are ready to
perform a manual switchover from Physical Standby to Primary.
Execute the following command using
“WITH SESSION SHUTDOWN” option.
SQL> Alter Database Commit to
Switchover to PRIMARY with session Shutdown;
Database
Altered.
Now your
PHYSICAL STANDBY Database has become PRIMARY.
Verify
this change, again query the Database role column of V$DATABASE.
Now it will return “PRIMARY”.
11. Shutdown the database
and again Start it to Open stage.
SQL > Shutdown immediate;
SQL > Startup Mount;
SQL > Alter database open;
J Manual
Switchover without Data guard in Oracle J
Switchover can be done without dataguard. Using switchover
operation we can convert physical standby database to primary and primary
database to physical standby database in Oracle. Switchover operation can
perform without dataguard and with dataguard. Here we can see how to perform
switchover operation without dataguard configuration. In following example we
assume that our primary database and physical standby database are working
properly. Standby database is up with managed recovery mode and there is no
archive log gap found in physical standby database. In short we can say that
our standby database is synchronize with primary database. Find out step by
step switchover operation. This switchover calls as manual switchover.
Primary Database: First we should need
to check our job queue processes and aq processes parameters. Check the value
of both parameter and save in your log book. These value should need to restore
while standby switchover finished. After get value of both parameters disable
those parameter with value 0. Execute log switch forcefully for all recent
changes also reflect to standby database. After finishing both tasks, execute
switchover command in primary database. Once successfully execution of
switchover command bring database down. Detail commands have been given below
with exact steps by steps.
SQL> show parameter aq_tm_processes
SQL> show parameter job_queue_processes
SQL> alter system set aq_tm_processes=0;
SQL> alter system set
job_queue_processes=0;
SQL>alter system switch logfile;
SQL>select switchover_status from
v$database;
SQL>alter database commit to switchover
to physical standby with session shutdown;
SQL>shutdown immediate;
Physical Standby Database:
First check archive sequence applied in standby database.
Because Physical Standby database needs to synchronize with primary database.
Disable log archive destination which contains service of Primary database.
Execute switchover command and wait for finishing. After successfully
completion of switchover command, shutdown database.
SQL>startup nomount
SQL>alter database mount standby
database;
SQL>alter system set log_archive_dest_state_2=defer
scope=both;
SQL>select switchover_status from
v$database;
SQL>alter database commit to switchover
to primary;
SQL>shutdown immediate
Now switchover operation is almost finished. We should need
to start both Primary (switched from physical standby) database and Physical
standby (switched from primary) database carefully.
Now start NEW PRIMARY database
(it was previous physical standby database).
SQL>startup
SQL> select switchover_status from v$database;
Now start NEW PHYSICAL STANDBY
database (It was previous primary database).
SQL>startup nomount
SQL>alter database mount standby
database;
SQL> select switchover_status from
v$database;
SQL>recover managed standby database
disconnect from session;
In NEW Primary Database:
Enable archive log destination which contains service of
standby database. Add tempfile in new primary database because before it was
physical standby. Restore all processes parameters which we got in first step.
Switch log forcefully to monitor archive shipping to physical standby database.
SQL>alter system set
log_archive_dest_state_2=enable scope=both;
SQL> alter tablespace TEMP add tempfile
'/u01/orcl/oradata/temp/temp01.dbf' size 1000M;
SQL> alter system set
job_queue_Processes=10;
SQL> alter system set aq_tm_processes=5;
SQL>alter system switch logfile;
Check connectivity from new physical standby to new primary
database. Also monitor archive log shipping from primary database to physical
standby database. Post check for switchover_status of v$database from both new
databases. Manual switchover without dataguard is finished.