Saturday, May 9, 2026

Oracle EBS: Hunting Rogue Sessions by ECID — The DBA's Precision Kill Guide

 

🔍 Oracle EBS: Hunting Rogue Sessions by ECID | appsdbastuff
🔍 Oracle EBS: Hunting Rogue Sessions by ECID — The DBA's Precision Kill Guide

You're in the middle of a production incident — a user's request is hung, the helpdesk ticket is escalating, and you need to surgically kill exactly that session without touching anything else. You could scroll through GV$SESSION manually. You could guess by username or machine. Or — you could use the Execution Context ID (ECID) to locate the exact session with zero ambiguity and generate the kill command in the same query.

This post is a deep-dive into a Phase 3.1 diagnostic script that does precisely that: ECID-based exact-match session hunting, filtered to oacore (the Oracle EBS managed server), with every forensic column you need to understand why the session is hung — not just where it is.

🧬 1. What is ECID and Why Does It Matter in EBS?

👶 Fresher

ECID — Execution Context Identifier — is a unique tag that Oracle middleware (WebLogic / OC4J) stamps onto every database session it opens on behalf of an application user request. Think of it as a transaction passport: it travels from the browser request through WebLogic, into the database connection, and gets stored in GV$SESSION.ECID.

🎓 Experienced

In Oracle EBS 12.2 with WebLogic 12c, the ECID is propagated via Oracle Diagnostic Logging (ODL) and is visible in both the WLS server logs (oacore managed server) and in GV$SESSION.ECID. When a user reports a hung page — and you have the WLS request log or an FND log reference — you extract the ECID from there and cross-match it to the DB session. This is far more reliable than matching on username/machine/IP, because in a connection-pooled environment like EBS, multiple sessions can belong to the same OS user from the same app tier host.

📌 Where to find the ECID for a specific user complaint:
1. FND: FND_LOG_MESSAGES — column EXECUTION_CONTEXT_ID
2. WLS Access Log: $DOMAIN_HOME/servers/oacore/logs/access.log — look for ecid=
3. WLS Diagnostic Log: oacore-diagnostic.log — search by username + timestamp
4. From the user: raise a support ticket and ask them to provide the URL request timestamp — you can backtrack from there.

📜 2. The Full Script — Phase 3.1

-- ============================================================
-- PHASE 3.1: FIND SESSION BY ECID (Exact Match)
-- ============================================================
SET LINES 200 PAGES 100
COL kill_cmd           FORMAT A60
COL program            FORMAT A25
COL module             FORMAT A30
COL action             FORMAT A30
COL ecid               FORMAT A55
COL event              FORMAT A40
COL sql_text           FORMAT A80

SELECT
    'ALTER SYSTEM KILL SESSION ''' || s.sid || ',' || s.serial# || ',@' || s.inst_id || ''' IMMEDIATE;'
        AS kill_cmd,
    s.inst_id,
    s.sid,
    s.serial#,
    s.status,
    s.state,
    TRUNC(s.last_call_et/60) || 'm ' || MOD(s.last_call_et, 60) || 's'  AS last_call_et,
    s.program,
    s.module,
    s.action,
    s.ecid,
    s.username,
    s.machine,
    s.process                                                            AS os_pid,
    s.sql_id,
    s.sql_child_number,
    s.prev_sql_id,
    s.event,
    s.wait_class,
    s.seconds_in_wait,
    s.blocking_session,
    s.blocking_instance,
    s.blocking_session_status
FROM  gv$session s
WHERE s.ecid    LIKE '%&ECID_EXTRACT%'
  AND   s.program LIKE '%oacore%'
ORDER BY s.last_call_et DESC;
⚠️ RAC Note: This script runs against GV$SESSION (Global View). In a RAC environment (e.g., Qualcomm's QCTEBSP04 / P05), the kill command already includes ,@inst_id so you can run it from any node and it will target the correct instance. No need to log in to the specific node.

🔬 3. Column-by-Column Breakdown

Column / Alias Source in GV$SESSION What It Tells You
kill_cmd Derived (SID + SERIAL# + INST_ID) Ready-to-execute kill statement. Copy → Paste → Done.
inst_id GV$SESSION.INST_ID RAC node number (1, 2, 3…). Critical for 2-node or 3-node RAC clusters.
sid / serial# SID, SERIAL# Together with inst_id, uniquely identify the session. SERIAL# prevents killing a recycled SID.
status STATUS ACTIVE = executing SQL; INACTIVE = idle (open cursor/lock held); KILLED = kill issued, OS cleanup pending.
state STATE WAITING = blocked on an event; WAITED SHORT TIME/WAITED KNOWN TIME = recently waited. Helps distinguish stuck from just slow.
last_call_et LAST_CALL_ET (seconds) How long since the last client call. Formatted as NNm NNs for readability. >5 minutes on ACTIVE = red flag.
program PROGRAM Connecting client binary. JDBC Thin Client = WebLogic pool; oracle@hostname = background/batch.
module MODULE Set by the app via DBMS_APPLICATION_INFO. In EBS: OAF page name, Form name, or Concurrent Program short name.
action ACTION Fine-grained action within the module (e.g., Query, Update, Commit). Extremely useful for PL/SQL debugging.
ecid ECID The execution context ID propagated from WebLogic. This is your match key for the user's request.
username USERNAME Oracle DB schema user. In EBS: typically APPS or the responsibility-owning schema. Not the FND user.
machine MACHINE Hostname of the connecting machine — usually the WebLogic app tier server (e.g., qcterpp05app1).
os_pid PROCESS OS process ID on the client side (WLS JVM thread). Useful if you need to correlate to a WLS thread dump.
sql_id / sql_child_number SQL_ID, SQL_CHILD_NUMBER Currently executing SQL. Feed into V$SQL for the plan, or AWR for historical behavior.
prev_sql_id PREV_SQL_ID Last SQL executed before the current one. On INACTIVE sessions this is often the real culprit SQL.
event EVENT The specific wait event (e.g., enq: TX - row lock contention, db file sequential read). The most important diagnostic column.
wait_class WAIT_CLASS High-level category: Concurrency, User I/O, Application, Network, Idle.
seconds_in_wait SECONDS_IN_WAIT How long it has been in the current wait event. Distinct from LAST_CALL_ET.
blocking_session / blocking_instance / blocking_session_status 3 columns If the session is blocked by another session, these identify the blocker. blocking_session_status = VALID means the blocker is alive and holding the lock.

⚡ 4. Understanding the Kill Command Construction

The script auto-builds the kill statement so you never construct it manually under pressure:

'ALTER SYSTEM KILL SESSION ''' || s.sid || ',' || s.serial# || ',@' || s.inst_id || ''' IMMEDIATE;'

This produces output like:

ALTER SYSTEM KILL SESSION '247,8931,@2' IMMEDIATE;
ComponentMeaning
247Session SID on instance 2
8931SERIAL# — distinguishes this session from any recycled SID 247
@2RAC Instance number — routes the kill to node 2 even if you're connected to node 1
IMMEDIATESkips the graceful wait — session is killed without waiting for it to respond. Use in stuck-session scenarios.
🚨 IMMEDIATE vs without IMMEDIATE:
Without IMMEDIATE, Oracle waits for the session to acknowledge the kill, which can take minutes if the session is stuck on a network/I-O wait. In production incidents, always use IMMEDIATE — but understand it still requires Oracle to rollback uncommitted work in the background. The session in GV$SESSION may remain with STATUS=KILLED for minutes while the rollback completes.

What Happens After the Kill?

-- Verify the session is gone (or in KILLED state rolling back)
SELECT sid, serial#, inst_id, status, last_call_et, username
FROM   gv$session
WHERE  sid = &SID
  AND  inst_id = &INST;
-- If STATUS=KILLED persists for >10 mins, the OS process may be stuck.
-- Escalate to OS-level kill: kill -9 <spid> (get SPID from V$PROCESS)

🎯 5. The WHERE Clause — ECID + oacore Filter

WHERE s.ecid    LIKE '%&ECID_EXTRACT%'
  AND s.program LIKE '%oacore%'

Why LIKE with % on both sides?

The ECID value in GV$SESSION is typically a composite string like:
56f8a7e1-2d14-4b78-9abc-...:1:1:oacore~0f3a
When you extract an ECID fragment from a WLS log or FND table, you often get a portion of the full string. The LIKE '%fragment%' pattern ensures you still match even with a partial extract. If you have the full ECID, switch to = '&ECID_EXACT' for better performance (avoids full table scan on large instances).

Why the oacore Filter?

In Oracle EBS 12.2, all OAF-based pages and self-service flows go through the oacore WebLogic managed server. Its JDBC pool connections appear in GV$SESSION.PROGRAM as something like JDBC Thin Client with MODULE populated from OAF, and critically the PROGRAM column carries oacore as a substring in some configurations.

💡 Pro Tip: If you're hunting sessions from the Forms tier (OAF's frmweb/forms services), change the filter to s.program LIKE '%frmweb%'. For Concurrent Manager sessions, filter on s.program LIKE '%FNDLIBR%' or join to FND_CONCURRENT_REQUESTS on ORACLE_SESSION_ID = s.audsid.

🕐 6. Decoding Wait Class & Event — What Is the Session Actually Doing?

WAIT_CLASSCommon EVENTLikely Root Cause in EBS
Application enq: TX - row lock contention Another session holds a lock. Check BLOCKING_SESSION.
Application enq: TM - contention Table-level lock — often a DDL or bulk DML running concurrently.
Concurrency library cache lock Shared pool contention — often seen during ADOP patching or mass recompile.
User I/O db file sequential read Single-block I/O — index lookup on a slow disk or high I/O load. Check SQL plan.
User I/O db file scattered read Multi-block I/O — full table scan. Check for missing index or bad plan.
Network SQL*Net message from client Idle — waiting for next request from WebLogic. Normal for pooled connections.
Idle jobq slave wait Scheduler slave idle. Ignore in session analysis.
Other wait for unread message on broadcast channel RAC global cache background — normal RAC operation, not a problem.
📎 Once you have EVENT + SQL_ID, your next step:
SELECT * FROM gv$sql WHERE sql_id = '&SQL_ID';
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('&SQL_ID', &CHILD, 'ALLSTATS LAST'));
These reveal the full SQL text and execution plan — confirming whether the hang is a lock wait, a runaway full-scan, or something deeper.

🔗 7. Blocking Session Chain Analysis

The three blocking columns — blocking_session, blocking_instance, blocking_session_status — tell you whether your target session is itself being blocked. If blocking_session_status = 'VALID', you have a blocker–waiter chain. In that case, killing the waiter alone won't fix the problem — you need to kill the blocker first (or trace why it's holding the lock).

-- Follow the blocking chain from your ECID session outward
SELECT
    LPAD(' ', LEVEL*2) || s.sid                AS session_tree,
    s.inst_id,
    s.serial#,
    s.status,
    s.event,
    s.sql_id,
    s.blocking_session,
    s.blocking_instance,
    s.username,
    s.module
FROM   gv$session s
START WITH  s.ecid LIKE '%&ECID_EXTRACT%'
CONNECT BY  PRIOR s.blocking_session = s.sid
        AND PRIOR s.blocking_instance = s.inst_id
ORDER SIBLINGS BY s.last_call_et DESC;
⚠️ Cross-instance blocking in RAC: A session on node 1 can be blocked by a session on node 2. The blocking_instance column captures this. The hierarchical query above handles cross-instance chains correctly because it joins on both SID and INST_ID.

🏗️ 8. RAC Awareness — Why GV$SESSION, Not V$SESSION

👶 Fresher

In a single-instance database, V$SESSION shows all sessions. In a RAC cluster, each node has its own V$SESSION — it only shows sessions on that node. GV$SESSION is the Global View that spans all nodes. The G prefix on any V$ view gives you the cluster-wide picture.

🎓 Experienced

On a 2-node or 3-node RAC, EBS WebLogic connection pools distribute sessions across nodes via load balancing. The ECID you extract from a WLS log does not tell you which node the corresponding DB session landed on. Running the query against GV$SESSION finds it regardless of which node it's on, and the inst_id column in the output — plus the ,@inst_id in the kill command — ensures the kill is routed correctly without you having to log in to a specific node.

-- Quick sanity: how many EBS oacore sessions per node right now?
SELECT   inst_id,
         COUNT(*) AS session_count,
         SUM(CASE WHEN status='ACTIVE' THEN 1 ELSE 0 END) AS active,
         SUM(CASE WHEN status='INACTIVE' THEN 1 ELSE 0 END) AS inactive
FROM     gv$session
WHERE    program LIKE '%oacore%'
GROUP BY inst_id
ORDER BY inst_id;

🔄 9. End-to-End Incident Workflow

┌──────────────────────────────────────────────────────────────────────────┐
│  USER REPORTS: "Page is hung / spinner running for 20 minutes"           │
└────────────────────────┬─────────────────────────────────────────────────┘

              STEP 1: Get the ECID
              ├─ From user: approx time + URL + responsibility
              ├─ From FND: SELECT execution_context_id FROM fnd_log_messages
              │            WHERE timestamp BETWEEN ... AND ...
              │            AND  module LIKE '%<page_name>%';
              └─ From WLS: grep 'ecid=' $DOMAIN/servers/oacore/logs/access.log

              STEP 2: Run Phase 3.1 Script
              └─ Supply ECID_EXTRACT at the prompt
                 → Confirms: SID, SERIAL#, INST_ID, EVENT, BLOCKING?

              STEP 3: Assess
              ├─ Is session ACTIVE or INACTIVE?
              ├─ What is the EVENT and WAIT_CLASS?
              ├─ Is BLOCKING_SESSION_STATUS = VALID?
              └─ How long is LAST_CALL_ET and SECONDS_IN_WAIT?

              STEP 4: Decision
              ├─ Blocked by another session? → Kill blocker first
              ├─ Long-running SQL (bad plan)? → Capture SQL_ID, check plan
              ├─ Network idle (SQL*Net from client)? → WLS thread issue, not DB
              └─ Confirmed stuck? → Execute kill_cmd from query output

              STEP 5: Verify + Document
              ├─ Re-run query → session should be gone or STATUS=KILLED
              ├─ Confirm user can re-access the page
              └─ Log RCA in incident ticket (ECID, SID, EVENT, action taken)
└──────────────────────────────────────────────────────────────────────────┘

✅ 10. Safety Checklist Before You Kill

🚨 Never kill a session without running through this list:
  1. Is the session ACTIVE with a legitimate long-running batch (Quarter-end, month-end, payroll)?
  2. Is the session part of an ADOP patching cutover phase (fs_clone, generate, finalize)?
  3. Does the MODULE/ACTION indicate a Concurrent Request? (FND_CONCURRENT_REQUESTS check)
  4. Is it a background Oracle process (PMON, SMON, LGWR)? Never kill these.
  5. Have you confirmed with the user/business that the session can be terminated?
  6. Are you logged in as SYSDBA or a user with ALTER SYSTEM privilege?
  7. Is this a production instance? If yes — get approval before proceeding.

🛠️ 11. Script Extensions & Variants

Extension A: Include SQL Text Inline

-- Add this to the SELECT to see current SQL without a separate query:
SELECT
    ...,
    (SELECT SUBSTR(sql_text, 1, 200)
       FROM  gv$sql q
      WHERE  q.sql_id    = s.sql_id
        AND  q.inst_id   = s.inst_id
        AND  q.child_number = s.sql_child_number
        AND  rownum = 1)  AS curr_sql_text,
    (SELECT SUBSTR(sql_text, 1, 200)
       FROM  gv$sql q
      WHERE  q.sql_id   = s.prev_sql_id
        AND  q.inst_id  = s.inst_id
        AND  rownum = 1)  AS prev_sql_text
FROM  gv$session s
WHERE s.ecid LIKE '%&ECID_EXTRACT%'
  AND s.program LIKE '%oacore%';

Extension B: FND_LOG_MESSAGES Cross-Reference

-- Map an FND log entry directly to the live DB session
SELECT
    fl.execution_context_id    AS ecid,
    fl.timestamp,
    fl.message_text,
    s.sid,
    s.serial#,
    s.inst_id,
    s.status,
    s.event,
    s.sql_id
FROM   fnd_log_messages  fl
JOIN   gv$session        s  ON s.ecid = fl.execution_context_id
WHERE  fl.execution_context_id LIKE '%&ECID_EXTRACT%'
   AND  s.program LIKE '%oacore%'
ORDER BY fl.timestamp DESC;

Extension C: Batch Kill All oacore Sessions Idle > 30 Minutes

-- USE WITH EXTREME CAUTION — Only in non-production or after business approval
BEGIN
  FOR r IN (
    SELECT inst_id, sid, serial#
    FROM   gv$session
    WHERE  program      LIKE '%oacore%'
      AND  status       = 'INACTIVE'
      AND  last_call_et > 1800  -- 30 minutes
  ) LOOP
    EXECUTE IMMEDIATE
      'ALTER SYSTEM KILL SESSION '''
      || r.sid || ',' || r.serial# || ',@' || r.inst_id
      || ''' IMMEDIATE';
  END LOOP;
END;
/

📝 12. Summary

What the Script DoesWhy It Matters
Matches session by ECID from WLS/FNDUnique identification — no ambiguity in connection-pooled EBS environments
Filters to oacore programLimits scope to OAF/self-service layer; excludes batch, forms, CM
Generates kill_cmd inlineZero manual effort — copy and execute under pressure
Includes inst_id in kill commandRAC-safe: works from any node in a multi-node cluster
Exposes EVENT + WAIT_CLASS + SECONDS_IN_WAITDistinguishes lock wait from I/O from idle — guides the right action
Exposes BLOCKING_SESSION + STATUSReveals if killing the waiter alone is insufficient
Shows MODULE + ACTION + SQL_IDFull forensic trail for RCA and post-incident documentation
Sorts by LAST_CALL_ET DESCWorst offenders (longest hung) surface immediately

ECID-based session targeting is one of those techniques that separates reactive firefighting from precise, professional incident management. Combined with a solid understanding of wait events and blocking chains, this script gives you everything you need to go from a user complaint to a resolved incident — with full auditability — in under five minutes.


No comments:

Post a Comment