Tuesday, May 30, 2023

MRP Stuck and No RFS Process on Standby

Primary node No Longer Transmits Archive Log Files To the physical standby database, MRP Stuck and No RFS Process on Standby


Issue:


Recently I faced strange issue where I observed that one of the standby node was not showing the RFS process, checked in V$MANAGED_STANDBY

V$MANAGED_STANDBY  - showing only ARCH and MRP processes.


Archive logs not shipping from Primary to Standby Node and there are no errors observed in primary and standby alert logs.


Tried to copy Archive logs manually from Primary and Standby but MRP recovery process not applying the further archive logs on Standby.



Cause:


This caused due to outage on the standby server for the network activity.



Observations:


On Primary:


ARCH process stuck on Primary instance

SELECT PROCESS, PID, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;


 




DB Mode Showing UNKNOWN

select dest_id id,database_mode db_mode,recovery_mode,

protection_mode,standby_logfile_count "SRLs",

standby_logfile_active ACTIVE,

archived_seq#

from v$archive_dest_status

where dest_id =2;


 


On Standby:

No RFS Process showing when query below:

SELECT PROCESS, PID, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;


Solution:


1. On Primary:


- Set log transport state to DEFER status:

SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=DEFER;



2. On standby server:


- Shutdown Database Listener


- Cancel Managed Recovery

SQL> alter database recover managed standby database cancel;


- Shutdown the Standby Database

SQL> shutdown immediate




3. On the Primary


Kill the ARCx Processes and the Database will respawn them automatically immediately without harming it.


ps -ef | grep -i arc


kill -9 <ospid of ARC process> <another ospid of ARC process> ...




4. On standby server


- Startup Standby Database and resume Managed Recovery

SQL> startup mount;

SQL> alter database recover managed standby database disconnect from session;



- Start Database Listener



5. Set log transport state to ENABLE status:

SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2=ENABLE;



6. Monitor the alert logs at each site and ensure log shipping and

apply is occurring again.


Also, verify the DB Mode on Primary -

select dest_id id,database_mode db_mode,recovery_mode,

protection_mode,standby_logfile_count "SRLs",

standby_logfile_active ACTIVE,

archived_seq#

from v$archive_dest_status

where dest_id =2; 



For Information: In case after Terminating ARCH process on primary node,

if it doesn't restart automatically, you can do the following :


On Primary:

SQL> alter system set log_archive_max_processes=4;

(increase log_archive_max_processes accordingly)


Weblogic authentication denied

Weblogic authentication denied

Problem description

The Weblogic admin server and / or managed server(s) are unable to start properly and throwing an authentication denied error message.


Weblogic errors observed

Error #1

 <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed.

Reason: weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid;
The user name and/or password from the boot identity file (boot.properties) is not valid.
The boot identity may havenbeen changed since the boot identity file was created. Please edit and update
the boot identity file with the proper values of username and password. The first time the updated boot identity file
is used to start the server, these new values are encrypted.

weblogic.security.SecurityInitializationException: Authentication denied: Boot identity not valid;

The user name and/or password from the boot identity file (boot.properties) is not valid.
The boot identity may have been changed since the boot identity file was created. Please edit and update the boot identity file
with the proper values of username and password. The first time the updated boot identity file is used to start the server,
these new values are encrypted.

at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.doBootAuthorization(Unknown Source)
at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.initialize(Unknown Source)
at weblogic.security.service.SecurityServiceManager.initialize(Unknown Source)
at weblogic.security.SecurityService.start(SecurityService.java:141)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
Truncated. see log file for complete stacktrace

Error #2

<Jul 30, 2011 5:11:55 AM PST> <Critical> <Security> <BEA-090403> <Authentication for user <user> denied>
<Jul 30, 2011 5:11:55 AM PST> <Critical> <WebLogicServer> <BEA-000386> <Server subsystem failed.
Reason: weblogic.security.SecurityInitializationException: Authentication for user <user> denied
weblogic.security.SecurityInitializationException: Authentication for user <user> denied

at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.doBootAuthorization(Unknown Source)
at weblogic.security.service.CommonSecurityServiceManagerDelegateImpl.initialize(Unknown Source)
at weblogic.security.service.SecurityServiceManager.initialize(Unknown Source)
at weblogic.security.SecurityService.start(SecurityService.java:141)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
Truncated. see log file for complete stacktrace

Possible root causes and solutions


Root cause #1

The Weblogic boot.properties file is corrupted or contains invalid principal and credentials

Solution >> boot.properties reset

·         Backup and clear the cache and data directories under <WL Domain>/servers/<Admin & Managed server>
·         Recreate boot.properties (put back your plain text username and password) under <WL Domain>/servers/<Admin & Managed server>/security directory and restart the affected server(s)

Root cause #2

The Weblogic boot.properties file is valid but the security realm is corrupted or in an invalid state

Solution >> Weblogic Admin username and password reset

·         Backup your Weblogic server domain
·         Rename or delete <WL Domain>/security/DefaultAuthenticatorInit.ldift
·         Run the following Java command:
        java weblogic.security.utils.AdminAccount <new-admin-user-name> <new-admin-user-pwd> <<WL Domain>/security >
·         Delete the contents inside the file boot.properties under <WL Domain>/servers/< AdminServer>/security
·         Add the following contents inside the boot.properties
        username=<new-admin-user-name>
        password=<new-admin-user-pwd>
·         Backup and delete the folder: <WL Domain>/servers/<AdminServer>/data/ldap
·          Restart your Weblogic server