Oracle Made Easy

Oracle Made Easy

11g RAC Database hanged or appears to hanged

Posted by virags on April 24, 2008

11g RAC Database hanged or appears to hanged

Around lunch time 1:00 PM database hanged. When we tried to login as sysdba
our sqlplus session also hanged. we not able to login on either of node of RAC.
Thought to kill all the remote connection , so we will able to login

ps -aef |grep LOCAL=NO |awk ‘{ print $2}’ |xargs kill -9

When your database hanged and you not able to login, we have used following procedure
to create hang analysis dump

sqlplus /nolog

SQL> set _prelim on
SQL> connect / as sysdba
SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug -g all hanganalyze 3
SQL> oradebug -g all hanganalyze 3
Cycle 1: (1/1965)–(1/1839)
Hang Analysis in /u04/oraout/abcap/11gdiag/diag/rdbms/abcap/abc2ap/trace/abc2ap_diag_9269.trc

We also generate ASH report for that(=hang) duration.
Here is what , hang analysis show

#
# Session id is in RED colour for further analysis
#

Chains most likely to have caused the hang:
[a] Chain 1 Signature: ‘row cache lock’<=’row cache lock’ (cycle) Chain 1 Signature Hash: 0×75bdd0c

===============================================================================
Cycles:

——————————————————————————-
Chain 1:
——————————————————————————-
Oracle session identified by:
{
instance: 1 (abcap.abc1ap)
os id: 27244
process id: 343, oracle@abc232
session id: 1839
session serial #: 12
}
is waiting for ‘row cache lock‘ with wait info:
{
p1: ‘cache id’=0×7
p2: ‘mode’=0×0
p3: ‘request’=0×5
time in wait: 0.526566 secs
heur. time in wait: 18.534318 secs
timeout after: 2.473434 secs
wait id: 49
blocking: 2 sessions
current sql:
short stack: <-ksedsts()+285<-ksdxfstk()+22<-ksdxcb()+1599<-sspuser()+102<-semtimedop()+36<-sskgpwwait()+211<-skgpwwait()+104<-ksliwat()+1256<-kslwaitctx()+135<-kqrigt()+1592<-kqrLockAndPinPo()+420<-kqrpre1()+915<-kqrpre()+57<-kziasfc()+235<-kpolnb()+5279<-kpoauth()+582<-opiodr()+991<-ttcpip()+1066<-opitsk()+1275<-opiino()+882<-opiodr()+991<-opidrv()+462<-sou2o()+91<-opimai_real()+109<-ssthrdmain()+142<-main()+116<-__libc_start_main()+211<-_start()+33 wait history: 1. event: ‘row cache lock
wait id: 48 p1: ‘cache id’=0×7
time waited: 3.001908 secs p2: ‘mode’=0×0
p3: ‘request’=0×5
2. event: ‘row cache lock
wait id: 47 p1: ‘cache id’=0×7
time waited: 3.001615 secs p2: ‘mode’=0×0
wait id: 47 p1: ‘cache id’=0×7
time waited: 3.001615 secs p2: ‘mode’=0×0
p3: ‘request’=0×5
3. event: ‘row cache lock’
wait id: 46 p1: ‘cache id’=0×7
time waited: 3.000437 secs p2: ‘mode’=0×0
p3: ‘request’=0×5
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (abcap.abc1ap)
os id: 27941
process id: 125, oracle@abc232
session id: 1965
session serial #: 476
}
which is waiting for ‘row cache lock’ with wait info:
{
p1: ‘cache id’=0×7
p2: ‘mode’=0×0
p3: ‘request’=0×3
time in wait: 0.462402 secs
heur. time in wait: 18.536906 secs
timeout after: 2.537598 secs
wait id: 30
blocking: 2 sessions
current sql:
short stack: <-ksedsts()+285<-ksdxfstk()+22<-ksdxcb()+1599<-sspuser()+102<-semtimedop()+36<-sskgpwwait()+211<-skgpwwait()+104<-ksliwat()+1256<-kslwaitctx()+135<-kqrigt()+1592<-kqrLockAndPinPo()+420<-kqrpre1()+915<-kqrpre()+57<-kkdlgui()+186<-kziavdb.()+2023<-kziaia.()+220<-kpolnb()+580<-kpoauth()+582<-opiodr()+991<-ttcpip()+1066<-opitsk()+1275<-opiino()+882<-opiodr()+991<-opidrv()+462<-sou2o()+91<-opimai_real()+109<-ssthrdmain()+142<-main()+116<-__libc_start_main()+211<-_start()+33 wait history: 1. event: ‘row cache lock’ wait id: 29 p1: ‘cache id’=0×7 time waited: 3.001776 secs p2: ‘mode’=0×0 p3: ‘request’=0×3 2. event: ‘row cache lock’ wait id: 28 p1: ‘cache id’=0×7 time waited: 3.001578 secs p2: ‘mode’=0×0 p3: ‘request’=0×3 3. event: ‘row cache lock’ wait id: 27 p1: ‘cache id’=0×7

When checked above session ( as marked in RED in above windows ) in ASH table.
We found both sessions are belong to “sys” user.

Lets have a look on ASH report, one which we created, when database was hanged

Top User Events

Event

Event Class

% Event

Avg Active Sessions

resmgr:cpu quantum

Scheduler

80.66

55.53

row cache lock

Concurrency

7.37

5.08

db file sequential read

User I/O

3.55

2.45

null event

Other

2.66

1.83

CPU + Wait for CPU

CPU

1.90

1.31

Top Event P1/P2/P3 Values

Event

% Event

P1 Value, P2 Value, P3 Value

% Activity

Parameter 1

Parameter 2

Parameter 3

resmgr:cpu quantum

80.75

“1″,”0″,”0″

60.85

location

“2″,”0″,”0″

17.81

“3″,”0″,”0″

2.09

row cache lock

7.38

“10″,”0″,”3″

7.31

cache id

mode

request

db file sequential read

3.58

“59″,”23006″,”1″

0.20

file#

block#

blocks

<

So ASH report is showing different story , that mean database was not hanged , it appeared to hanged due to event “resmgr:cpu quantum”

As per

Oracle® Database Reference
11g Release 1 (11.1)

Part Number B28320-01

resmgr: cpu quantum

The session is waiting to be allocated a quantum of cpu. This event occurs when the resource manager is enabled and is throttling CPU consumption. To reduce the occurrence of this wait event, increase the CPU allocation for the sessions’s current consumer group.

Wait Time: The time the session waited to acquire a CPU quantum

We have not enabled any resource plan , how it come into picture,In Oracle
11g all predefine maintenance windows use DEFAULT_MAINTENANCE_PLAN resource plan and Automatedmaintenance tasks run under its subplan ORA$AUTOTASK_SUB_PLAN supportEmptyParas]–>which has 25% resource allocation.

Possible Work around

Disable resource plan

Posted in oracle 11g, resource plan | Leave a Comment »