1.两个节点查询v$asm_disk均卡住,等待enq: DD - contention,阻塞源头为rbal进程,rbal进程未发生阻塞,未在异常等待事件上。
2.阻塞源头RBAL,在CPU上运行。没有在做rebalance磁盘平衡。
3.diag诊断日志中,阻塞源头均指向rbal进程,rbal没有发生等待。
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'Chain 1 Signature Hash: 0x7bd12357
[b] Chain 2 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'Chain 2 Signature Hash: 0x7bd12357
[c] Chain 3 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'Chain 3 Signature Hash: 0x7bd12357
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------Oracle session identified by:{instance: 2 (+asm.+asm2)os id: 89375process id: 42, oracle@dg91 (TNS V1-V3)session id: 2605session serial #: 781}is waiting for 'enq: DD - contention' with wait info:{p1: 'name|mode'=0x44440006p2: 'disk group'=0x0p3: 'type'=0x1time in wait: 21 min 38 sectimeout after: neverwait id: 4blocking: 0 sessionscurrent sql: select grpnum_kfdsk, number_kfdsk, compound_kfdsk, incarn_kfdsk, mntsts_kfdsk, hdrsts_kfdsk, compat_kfdsk, mode_kfdsk, state_kfdsk, redun_kfdsk, libnam_kfdsk, totmb_kfdsk, usedmb_kfdsk, asmname_kfdsk, failname_kfdsk, label_kfdsk, path_kfdsk, udid_kfdsk, kfkid_kfdsk, crdate_kfdsk, mtdate_kfdsk, timer_kfdsk , dbcompat_kshort stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-ksqcmi()+2848<-ksqgtlctx()+3501<-ksqgelctx()+557<-kfgUseDmt()+655<-kfgTableCb()+1718<-kfdDskTableCbInternal()+233<-kfdDskTableCb()+56<-qerfxFetch()+3164<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245wait history:* time between current wait and wait #1: 0.000124 sec1. event: 'SQL*Net message to client'time waited: 0.000001 secwait id: 3 p1: 'driver id'=0x62657100p2: '#bytes'=0x1* time between wait #1 and #2: 0.003768 sec2. event: 'SQL*Net message from client'time waited: 0.000418 secwait id: 2 p1: 'driver id'=0x62657100p2: '#bytes'=0x1* time between wait #2 and #3: 0.000015 sec3. event: 'SQL*Net message to client'time waited: 0.000002 secwait id: 1 p1: 'driver id'=0x62657100p2: '#bytes'=0x1}and is blocked by
=> Oracle session identified by:{instance: 2 (+asm.+asm2)os id: 420752process id: 27, oracle@dg91 (TNS V1-V3)session id: 1675session serial #: 29811}which is waiting for 'rdbms ipc reply' with wait info:{p1: 'from_process'=0x12p2: 'timeout'=0x7fec666btime in wait: 2.078555 sectimeout after: 0.000000 secwait id: 642263blocking: 11 sessionscurrent sql: select name_kfgrp, number_kfgrp, incarn_kfgrp, compat_kfgrp, dbcompat_kfgrp, state_kfgrp, flags32_kfgrp, type_kfgrp, refcnt_kfgrp, sector_kfgrp, blksize_kfgrp, ausize_kfgrp , totmb_kfgrp, freemb_kfgrp, coldmb_kfgrp, hotmb_kfgrp, minspc_kfgrp, usable_kfgrp, offline_kfgrp, lflags_kfgrp from x$kfgrpshort stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kslwait()+141<-ksarcr()+219<-ksbwcoex()+35<-kfgbSendWithPin()+442<-kfgbSendShallow()+137<-kfgDiscoverShallow()+268<-kfgGlobalOpen()+264<-kfgDiscoverDeep()+302<-kfgDiscoverGroup()+869<-kfgTableCb()+2339<-kfgGrpTableCbInternal()+4169<-kfgGrpTableCb()+56<-qerfxFetch()+3164<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-wait history:* time between current wait and wait #1: 0.000065 sec1. event: 'rdbms ipc reply'time waited: 1.999940 secwait id: 642262 p1: 'from_process'=0x12p2: 'timeout'=0x7fec666d* time between wait #1 and #2: 0.000064 sec2. event: 'rdbms ipc reply'time waited: 1.999885 secwait id: 642261 p1: 'from_process'=0x12p2: 'timeout'=0x7fec666f* time between wait #2 and #3: 0.000067 sec3. event: 'rdbms ipc reply'time waited: 1.999927 secwait id: 642260 p1: 'from_process'=0x12p2: 'timeout'=0x7fec6671}and is blocked by
=> Oracle session identified by:{instance: 2 (+asm.+asm2)os id: 70866process id: 18, oracle@dg91 (RBAL)session id: 1117session serial #: 1}which is not in a wait:{last wait: 21410 min 11 sec agoblocking: 12 sessionscurrent sql: <none>short stack: <none: error encountered - ORA-32515: cannot issue ORADEBUG command 'SHORT_STACK' to process 'Unix process pid: 70866, image: oracle@dg91 (RBAL)'; prior command execution time exceeds 30000 ms>wait history:1. event: 'CSS operation: action'time waited: 0.000003 secwait id: 67025744 p1: 'function_id'=0x43* time between wait #1 and #2: 0.000002 sec2. event: 'GPnP Termination'time waited: 0.006598 secwait id: 67025743 * time between wait #2 and #3: 0.000002 sec3. event: 'GPnP Get Item'time waited: 0.006473 secwait id: 67025742 }
Chain 1 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'
4.gpnp日志中,一直在刷以下日志
尝试方式:
1.kill gpnp进程没效果
2.重启集群 集群起不来 cssd 无法启动
/var/log/message 报链路错误,部分ASM磁盘从存储端断开,添加到其它服务器使用,服务器端未做清理磁盘链路操作,路径还在链路不在了。导致cssd扫描磁盘时处于异常状态。
最后重启操作系统解决的。释放掉报错的磁盘链路,CSS正常启动成功,怀疑是异常的路径影响CSS启动。