Saturday, May 9, 2026

Advanced ECID-ASH Forensics in Oracle EBS 12.2

🧠 Advanced ECID-ASH Forensics in Oracle EBS 12.2 | appsdbastuff
🧠 Advanced ECID–ASH Forensics in Oracle EBS 12.2 — Beyond the Basic Kill
Audience levels: 👶 Fresher 🎓 Experienced DBA 🔥 Lead / COE

The standard ECID-hunt scripts — find the session, build the kill statement, fire it — are table stakes. Any competent DBA can do that in five minutes. But what happens after you kill the session and the user reports the same hang forty minutes later? What happens when the ECID has already left GV$SESSION by the time you respond, and you need to reconstruct exactly what that WebLogic thread was doing at 02:17 AM on a Tuesday from AWR history? What do you do when 14 different ECIDs are all piling onto the same SQL, and killing them one by one is just bailing water out of a sinking boat?

This post takes the standard ECID analysis toolkit and builds it into a full forensic investigation framework — combining ASH, AWR, DBA_HIST, analytic window functions, execution plan deltas, cross-instance RAC correlation, and PL/SQL automation — so you can answer not just "which session?" but "why, since when, from which EBS module, and what do I do to prevent it happening again?"

🏗️ 1. ECID Propagation Architecture — The Full Stack

👶 Fresher

Before writing a single SQL query, understand where the ECID lives in the EBS stack — because the right extraction point depends on where the hang is occurring.

  ┌─────────────────────────────────────────────────────────┐
  │  BROWSER (User's request)                               │
  │  HTTP POST → F5 LTM (SSL VIP :443)                      │
  └────────────────────┬────────────────────────────────────┘
                       │ ECID generated here by WLS
  ┌────────────────────▼────────────────────────────────────┐
  │  WebLogic 12c  — oacore Managed Server                  │
  │  ▸ ECID stamped in ODL context: ecid=1.xxx...;kXjE      │
  │  ▸ Visible in: oacore-diagnostic.log / access.log        │
  │  ▸ Propagated via JDBC thin driver into DB session       │
  └────────────────────┬────────────────────────────────────┘

  ┌────────────────────▼────────────────────────────────────┐
  │  Oracle DB 19c RAC (ASM / Solaris SPARC)                │
  │  ▸ GV$SESSION.ECID         ← live sessions              │
  │  ▸ GV$ACTIVE_SESSION_HISTORY.ECID ← last ~60 mins       │
  │  ▸ DBA_HIST_ACTIVE_SESS_HISTORY.ECID ← AWR history      │
  │  ▸ FND_LOG_MESSAGES.EXECUTION_CONTEXT_ID ← EBS FND      │
  └─────────────────────────────────────────────────────────┘
🎓 Experienced

The ECID format in 19c is typically: 1.<random_hex>;<short_suffix>. The short suffix (;kXjE, ;0f3a) is the part that identifies the WLS request uniquely. When grepping logs, use the middle random hex string as your LIKE pattern — it is the most stable and unique fragment. Avoid matching on the prefix 1. as it is constant across all ECIDs on the same instance.

📌 ASH Retention Rules (Critical to Know):
GV$ACTIVE_SESSION_HISTORY — in-memory, ~1 hour (sampled every 1 second, 1-in-10 written to AWR)
DBA_HIST_ACTIVE_SESS_HISTORY — AWR flushed copy, default 8-day retention, 10-second sample interval
FND_LOG_MESSAGES — EBS-controlled, retention driven by FND: Debug Log Retention profile option
Key implication: If the incident is >1 hour old and AWR retention is set to default, use DBA_HIST. Always confirm AWR retention: SELECT retention FROM dba_hist_wr_control;

⚙️ 2. Why ASH is the Right Tool (and its Limits)

✅ ASH Strengths

1-second granularity for live sessions. ECID persisted in every sample. Includes SQL_ID, wait event, blocking session, plan hash per sample. No performance overhead on the application.

⚠️ ASH Limitations

Only samples active sessions (STATUS=ACTIVE). Idle/INACTIVE sessions (e.g., holding a lock but not running SQL) are NOT captured. Short-lived sessions (<1 second) may be missed entirely. GV$ is in-memory only — flushed on instance restart.

⚠️ The invisible lock holder problem: A session holding a TX lock with STATUS=INACTIVE does NOT appear in ASH. The waiter appears (high enq: TX - row lock contention), but the blocker is invisible. For these cases, always cross-check with GV$SESSION live + GV$LOCK for the blocker SID, then trace that blocker separately.

📊 3. Query 1 — Enriched ASH ECID Scan with Temporal Heat ADVANCED

The basic version groups by sample_time which produces too many rows. This version uses time buckets (5-minute windows) and a heat ratio to immediately show you when the ECID was burning DB time — not just that it was.

-- ============================================================
-- Q1: ECID Temporal Heat Map — 5-Min Buckets, Last 12 Hours
-- Shows WHEN each ECID caused DB pressure, not just IF
-- ============================================================
SELECT
    TO_CHAR(TRUNC(a.sample_time, 'MI') -
        MOD(TO_NUMBER(TO_CHAR(a.sample_time,'MI')),5)/(24*60),
        'YYYY-MM-DD HH24:MI')              AS time_bucket_5min,
    a.ecid,
    COUNT(*)                                AS total_samples,
    SUM(CASE WHEN a.wait_class != 'Idle'
             THEN 1 ELSE 0 END)           AS active_samples,
    ROUND(SUM(CASE WHEN a.wait_class != 'Idle'
             THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS pct_active,
    SUM(CASE WHEN a.wait_class = 'User I/O'
             THEN 1 ELSE 0 END)           AS io_samples,
    SUM(CASE WHEN a.wait_class = 'Application'
             THEN 1 ELSE 0 END)           AS lock_samples,
    SUM(CASE WHEN a.wait_class = 'Concurrency'
             THEN 1 ELSE 0 END)           AS concurrency_samples,
    MAX(a.module)                           AS module,
    MAX(a.sql_id)
        KEEP (DENSE_RANK LAST ORDER BY
              COUNT(*) OVER (PARTITION BY a.ecid, a.sql_id))
                                            AS dominant_sql_id,
    MAX(a.event)
        KEEP (DENSE_RANK LAST ORDER BY
              COUNT(*) OVER (PARTITION BY a.ecid, a.event))
                                            AS dominant_event,
    LPAD(RPAD('█', ROUND(SUM(CASE WHEN a.wait_class != 'Idle'
              THEN 1 ELSE 0 END) * 20.0 / MAX(COUNT(*))
                  OVER (PARTITION BY a.ecid)), '░'), 20)
                                            AS heat_bar
FROM   gv$active_session_history a
WHERE  a.sample_time > SYSDATE - 12/24
  AND  a.ecid        IS NOT NULL
  AND  a.program     LIKE '%oacore%'
GROUP BY
    TO_CHAR(TRUNC(a.sample_time, 'MI') -
        MOD(TO_NUMBER(TO_CHAR(a.sample_time,'MI')),5)/(24*60),
        'YYYY-MM-DD HH24:MI'),
    a.ecid
HAVING COUNT(*) > 3
ORDER BY time_bucket_5min DESC, active_samples DESC;
🔥 Advanced Technique — heat_bar: The LPAD/RPAD('█'...,'░') construct renders an ASCII bar chart inline in SQL*Plus / SQL Developer output. Each row becomes a visual timeline entry — you can immediately see which ECID was burning active samples at which 5-minute window without leaving the DB tool. The KEEP (DENSE_RANK LAST ORDER BY COUNT(*) OVER PARTITION BY...) pattern selects the single most-frequent SQL_ID and EVENT per ECID per bucket — the statistical mode, not MAX by sort order.

🔬 4. Query 2 — ECID Event Sequence Reconstruction ADVANCED

This rebuilds the exact sequence of events for a specific ECID as a timeline — useful for proving to the business exactly what the request did step-by-step and for how long.

-- ============================================================
-- Q2: ECID Event Timeline — Ordered Sequence with Gaps
-- Reconstructs what the WLS thread did, step by step
-- ============================================================
SELECT
    a.sample_time,
    a.ecid,
    a.session_id                                        AS sid,
    a.session_serial#                                   AS serial#,
    a.inst_id,
    a.sql_id,
    a.sql_child_number,
    a.plan_hash_value,
    a.event,
    a.wait_class,
    ROUND(a.time_waited / 1000, 2)                    AS time_waited_ms,
    a.blocking_session,
    a.blocking_inst_id,
    -- Gap analysis: seconds since previous sample for this ECID
    ROUND((CAST(a.sample_time AS TIMESTAMP)
          - CAST(LAG(a.sample_time)
                   OVER (PARTITION BY a.ecid
                         ORDER BY a.sample_time) AS TIMESTAMP))
         * 86400, 1)                                  AS gap_from_prev_secs,
    -- Cumulative DB time spent (non-idle) for this ECID
    SUM(CASE WHEN a.wait_class != 'Idle' THEN 1 ELSE 0 END)
        OVER (PARTITION BY a.ecid ORDER BY a.sample_time
              ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
                                                        AS cumul_active_samples,
    -- Event change flag: did the wait event change from prior sample?
    CASE WHEN a.event = LAG(a.event)
                         OVER (PARTITION BY a.ecid
                               ORDER BY a.sample_time)
         THEN '→ same'
         ELSE '★ CHANGED'
    END                                                 AS event_change,
    a.module,
    a.action,
    a.program,
    a.machine
FROM   gv$active_session_history a
WHERE  a.sample_time > SYSDATE - 12/24
  AND  a.ecid LIKE '%&ECID_EXTRACT%'
ORDER BY a.sample_time ASC;   -- ASC = chronological for readability
💡 Reading the output: Look for rows where event_change = '★ CHANGED' — these are state transitions. A session that goes CPU → enq: TX row lock → SQL*Net from client → CPU tells a very different story than one stuck on db file sequential read for 200 consecutive samples. The gap_from_prev_secs column reveals moments where the session was INACTIVE or completed — a gap >10 seconds means the ECID was idle (waiting for next WLS call or finished a round-trip).

📈 5. Query 3 — Top ECIDs by DB Time with Full Wait Profile ADVANCED

-- ============================================================
-- Q3: Top ECIDs — Multi-Dimensional Wait Profile
-- Shows the complete wait breakdown per ECID in one row
-- ============================================================
SELECT
    ecid,
    module,
    MIN(sample_time)                                AS first_seen,
    MAX(sample_time)                                AS last_seen,
    ROUND((MAX(sample_time) - MIN(sample_time)) * 1440, 1)
                                                     AS elapsed_mins,
    COUNT(*)                                        AS total_ash_samples,
    -- DB Time breakdown (non-idle active samples)
    SUM(CASE WHEN wait_class != 'Idle' THEN 1 ELSE 0 END)
                                                     AS db_time_samples,
    SUM(CASE WHEN wait_class = 'CPU'          THEN 1 ELSE 0 END)
                                                     AS cpu_samples,
    SUM(CASE WHEN wait_class = 'User I/O'    THEN 1 ELSE 0 END)
                                                     AS io_samples,
    SUM(CASE WHEN wait_class = 'Application' THEN 1 ELSE 0 END)
                                                     AS lock_samples,
    SUM(CASE WHEN wait_class = 'Concurrency' THEN 1 ELSE 0 END)
                                                     AS concurrency_samples,
    SUM(CASE WHEN wait_class = 'Network'     THEN 1 ELSE 0 END)
                                                     AS network_samples,
    -- Most frequent SQL ID for this ECID (using subquery-free pivot)
    STATS_MODE(sql_id)                              AS dominant_sql_id,
    STATS_MODE(event)                               AS dominant_event,
    STATS_MODE(plan_hash_value)                     AS dominant_plan_hash,
    -- Plan instability flag (multiple plan hashes for same ECID)
    CASE WHEN COUNT(DISTINCT plan_hash_value) > 1
         THEN '⚠ PLAN UNSTABLE ('
              || COUNT(DISTINCT plan_hash_value) || ' plans)'
         ELSE '✔ stable'
    END                                             AS plan_stability,
    COUNT(DISTINCT session_id || '@' || inst_id) AS distinct_sessions
FROM   gv$active_session_history
WHERE  sample_time > SYSDATE - 12/24
  AND  ecid        IS NOT NULL
  AND  program     LIKE '%oacore%'
GROUP BY ecid, module
HAVING  COUNT(*) > 5
ORDER BY db_time_samples DESC
FETCH FIRST 30 ROWS ONLY;
🔥 Advanced Technique — STATS_MODE(): STATS_MODE(sql_id) returns the statistical mode — the most frequently occurring value. This is far superior to MAX(sql_id) (which returns alphabetically last, meaningless) or a correlated subquery. It natively answers: "which SQL did this ECID spend the most time on?" The plan_stability column immediately flags ECIDs with plan instability — a key signal that the real problem is a bad execution plan flip, not a lock or I/O issue.

👤 6. Query 4 — ECID to FND User Mapping ADVANCED

GV$SESSION.USERNAME is always APPS in EBS. You need to join through FND_LOG_MESSAGES or V$SESSION.CLIENT_IDENTIFIER to get the actual FND user. This query does it directly from ASH without requiring a live session.

-- ============================================================
-- Q4: ECID → FND User + Responsibility Resolution
-- Works even after the session has ended (uses FND log history)
-- ============================================================
SELECT DISTINCT
    a.ecid,
    a.client_id                               AS fnd_user_or_client_id,
    fu.user_name                              AS fnd_username,
    fu.description                            AS fnd_user_desc,
    flm.module                                AS fnd_module,
    MIN(a.sample_time) OVER (PARTITION BY a.ecid)
                                              AS ecid_first_seen,
    MAX(a.sample_time) OVER (PARTITION BY a.ecid)
                                              AS ecid_last_seen,
    COUNT(*) OVER (PARTITION BY a.ecid)     AS ecid_total_samples,
    STATS_MODE(a.event) OVER (PARTITION BY a.ecid)
                                              AS dominant_event,
    STATS_MODE(a.sql_id) OVER (PARTITION BY a.ecid)
                                              AS dominant_sql_id
FROM       gv$active_session_history  a
LEFT JOIN  fnd_user                   fu
        ON fu.user_name = UPPER(a.client_id)
LEFT JOIN  (
    SELECT DISTINCT execution_context_id, module
    FROM  fnd_log_messages
    WHERE log_sequence > (SELECT MAX(log_sequence) - 1000000
                           FROM  fnd_log_messages)
) flm
        ON flm.execution_context_id = a.ecid
WHERE  a.sample_time > SYSDATE - 12/24
  AND  a.ecid        IS NOT NULL
  AND  a.program     LIKE '%oacore%'
ORDER BY  ecid_total_samples DESC;
📌 CLIENT_ID in EBS: WebLogic propagates the FND username into GV$SESSION.CLIENT_IDENTIFIER (and therefore ASH.CLIENT_ID) via DBMS_SESSION.SET_IDENTIFIER. The join to FND_USER converts this to the full user record. The FND_LOG_MESSAGES subquery adds the OAF module context — the actual page or responsibility the user was navigating. The log_sequence predicate avoids a full table scan on the often-huge FND_LOG_MESSAGES table.

🌐 7. Query 5 — Cross-Instance RAC ECID Correlation ADVANCED

In a 2-node (or 3-node) RAC, a single WLS request can spawn activity on multiple instances — for example, the SQL runs on node 1 but the blocking session is on node 2, causing global cache waits. This query maps the full multi-node footprint of an ECID.

-- ============================================================
-- Q5: Cross-Instance RAC Footprint for a Specific ECID
-- Identifies global cache pressure, cross-node blocking
-- ============================================================
SELECT
    a.inst_id,
    a.session_id                                        AS sid,
    a.session_serial#                                   AS serial#,
    MIN(a.sample_time)                                  AS first_seen,
    MAX(a.sample_time)                                  AS last_seen,
    COUNT(*)                                            AS samples_on_node,
    STATS_MODE(a.event)                                 AS dominant_event,
    STATS_MODE(a.sql_id)                                AS dominant_sql_id,
    -- GC (global cache) wait detection
    SUM(CASE WHEN a.event LIKE 'gc%' THEN 1 ELSE 0 END)
                                                        AS gc_wait_samples,
    SUM(CASE WHEN a.event LIKE 'gc cr%' THEN 1 ELSE 0 END)
                                                        AS gc_cr_samples,
    SUM(CASE WHEN a.event LIKE 'gc buffer busy%' THEN 1 ELSE 0 END)
                                                        AS gc_buffer_busy,
    -- Cross-node blocking (blocker on different inst_id)
    SUM(CASE WHEN a.blocking_inst_id != a.inst_id
              AND  a.blocking_inst_id IS NOT NULL
         THEN 1 ELSE 0 END)                           AS cross_node_block_samples,
    a.machine                                           AS app_server
FROM   gv$active_session_history a
WHERE  a.sample_time > SYSDATE - 12/24
  AND  a.ecid LIKE '%&ECID_EXTRACT%'
GROUP BY a.inst_id, a.session_id, a.session_serial#, a.machine
ORDER BY a.inst_id, samples_on_node DESC;
⚠️ gc buffer busy = RAC hot block: If gc_buffer_busy is non-zero, the ECID is hitting a hot block shared between RAC nodes — usually caused by an unindexed column with high DML frequency, or sequences without caching. This is a tuning issue, not a session kill issue. Killing the session will not fix it.

📐 8. Query 6 — ECID Plan Stability Check ADVANCED

-- ============================================================
-- Q6: Plan Instability Detection per ECID
-- Identifies ECIDs where the execution plan changed mid-flight
-- ============================================================
SELECT
    a.ecid,
    a.sql_id,
    a.plan_hash_value,
    COUNT(*)                                AS samples_with_this_plan,
    MIN(a.sample_time)                       AS plan_first_seen,
    MAX(a.sample_time)                       AS plan_last_seen,
    AVG(a.time_waited) / 1000              AS avg_wait_ms_this_plan,
    -- Is this plan the current one still in shared pool?
    CASE WHEN EXISTS (
        SELECT 1 FROM gv$sql sq
        WHERE  sq.sql_id        = a.sql_id
          AND  sq.plan_hash_value = a.plan_hash_value
    ) THEN '✔ in pool'
      ELSE '✘ aged out'
    END                                     AS plan_in_shared_pool,
    -- Relative plan performance vs best plan for same SQL
    ROUND(AVG(a.time_waited) /
          MIN(AVG(a.time_waited)) OVER
              (PARTITION BY a.ecid, a.sql_id), 2)
                                            AS relative_cost_vs_best_plan
FROM   gv$active_session_history a
WHERE  a.sample_time   > SYSDATE - 12/24
  AND  a.ecid          LIKE '%&ECID_EXTRACT%'
  AND  a.plan_hash_value IS NOT NULL
GROUP BY a.ecid, a.sql_id, a.plan_hash_value
ORDER BY a.sql_id, samples_with_this_plan DESC;
🔥 Advanced Technique — relative_cost_vs_best_plan: The MIN(AVG(time_waited)) OVER (PARTITION BY ecid, sql_id) finds the lowest average wait among all plan variants for the same SQL. The ratio tells you how much worse the bad plan is: a value of 8.3 means the bad plan is 8.3× slower than the best known plan for that SQL. This is a quantified argument for raising an SR or pinning a plan with SQL Plan Management (SPM).

🗄️ 9. Query 7 — AWR Historical ECID Reconstruction ADVANCED

When ASH memory has flushed (>1 hour ago), pivot to DBA_HIST_ACTIVE_SESS_HISTORY and join with DBA_HIST_SQLSTAT for aggregated SQL performance for the same period.

-- ============================================================
-- Q7: AWR Historical ECID Reconstruction + SQL Stats
-- For incidents older than 1 hour — requires Diagnostics Pack
-- ============================================================
WITH ash_hist AS (
    SELECT
        h.snap_id,
        h.ecid,
        h.session_id                           AS sid,
        h.session_serial#                      AS serial#,
        h.instance_number                      AS inst_id,
        h.sample_time,
        h.sql_id,
        h.plan_hash_value,
        h.event,
        h.wait_class,
        h.time_waited,
        h.blocking_session,
        h.module,
        h.action,
        h.client_id,
        COUNT(*) OVER (PARTITION BY h.ecid)  AS ecid_total_samples,
        STATS_MODE(h.sql_id) OVER
            (PARTITION BY h.ecid)             AS dominant_sql_id
    FROM   dba_hist_active_sess_history h
    WHERE  h.sample_time > SYSDATE - 12/24
      AND  h.ecid         LIKE '%&ECID_EXTRACT%'
),
sql_perf AS (
    -- AWR SQL stats for the same snap range covered by the ASH hits
    SELECT
        ss.snap_id,
        ss.sql_id,
        ss.plan_hash_value,
        ss.executions_delta                    AS execs,
        ROUND(ss.elapsed_time_delta /
              NULLIF(ss.executions_delta, 0) / 1000, 2)
                                               AS avg_elapsed_ms,
        ROUND(ss.cpu_time_delta /
              NULLIF(ss.executions_delta, 0) / 1000, 2)
                                               AS avg_cpu_ms,
        ROUND(ss.buffer_gets_delta /
              NULLIF(ss.executions_delta, 0), 0)
                                               AS avg_buffer_gets,
        ROUND(ss.disk_reads_delta /
              NULLIF(ss.executions_delta, 0), 0)
                                               AS avg_disk_reads,
        ss.rows_processed_delta                AS total_rows
    FROM  dba_hist_sqlstat ss
    WHERE ss.executions_delta > 0
)
SELECT
    a.ecid,
    a.sample_time,
    a.sid,
    a.serial#,
    a.inst_id,
    a.sql_id,
    a.plan_hash_value,
    a.event,
    a.wait_class,
    a.module,
    a.client_id,
    a.ecid_total_samples,
    a.dominant_sql_id,
    sp.execs,
    sp.avg_elapsed_ms,
    sp.avg_cpu_ms,
    sp.avg_buffer_gets,
    sp.avg_disk_reads
FROM      ash_hist  a
LEFT JOIN sql_perf  sp
       ON sp.snap_id       = a.snap_id
      AND sp.sql_id        = a.sql_id
      AND sp.plan_hash_value = a.plan_hash_value
ORDER BY a.sample_time DESC;
📌 DBA_HIST vs GV$ASH — Key Differences:
DBA_HIST samples at 10-second intervals (not 1 second), so short events (<10 sec) may be invisible.
The snap_id join ensures SQL stats are from the same AWR snapshot period as the ASH samples, giving you elapsed time and buffer gets that were actually measured during the incident — not a 24-hour aggregate.

🔗 10. Query 8 — ECID Blocking Chain Tree (Hierarchical + Cross-Node) ADVANCED

-- ============================================================
-- Q8: Full Blocking Chain from ECID Session Outward
-- Handles RAC cross-instance chains; shows lock object name
-- ============================================================
SELECT
    LPAD('  ', LEVEL * 3) || s.inst_id || ':' || s.sid
                                              AS session_tree,
    LEVEL                                   AS chain_depth,
    s.status,
    s.last_call_et                          AS active_secs,
    s.event,
    s.wait_class,
    s.seconds_in_wait,
    s.sql_id,
    s.module,
    s.action,
    s.username,
    s.machine,
    s.blocking_session,
    s.blocking_instance,
    s.blocking_session_status,
    -- Resolve lock object (for TX/TM waits)
    (SELECT o.object_name || ' (' || o.object_type || ')'
       FROM  gv$lock    lk
       JOIN  dba_objects o ON o.object_id = lk.id1
      WHERE  lk.sid     = s.sid
        AND  lk.inst_id = s.inst_id
        AND  lk.type    = 'TM'
        AND  rownum     = 1)               AS locked_object,
    -- OS SPID for escalation to OS kill if needed
    p.spid                                  AS os_spid
FROM       gv$session s
LEFT JOIN  gv$process p
        ON p.addr    = s.paddr
       AND p.inst_id = s.inst_id
START WITH  s.ecid LIKE '%&ECID_EXTRACT%'
        AND s.program LIKE '%oacore%'
CONNECT BY NOCYCLE
    PRIOR s.blocking_session  = s.sid
    AND   PRIOR s.blocking_instance = s.inst_id
ORDER SIBLINGS BY s.last_call_et DESC;
💡 NOCYCLE is mandatory here. In rare cases, Oracle can report circular blocking chains (two sessions each blocking the other — a deadlock resolved by Oracle but not yet cleaned up in GV$SESSION). Without NOCYCLE, the query loops infinitely. The locked_object column resolves the TM lock to the actual table name — critical for knowing which EBS table is being locked (e.g., PO_HEADERS_ALL, OE_ORDER_LINES_ALL) so you can engage the functional team.

🔀 11. Query 9 — ECID to Concurrent Request Bridge ADVANCED

Concurrent Manager requests spawn their own DB sessions. If an ECID originates from a CM worker process that incorrectly shows up via oacore (e.g., a self-service submission), this query bridges the gap.

-- ============================================================
-- Q9: ECID ↔ Concurrent Request Cross-Reference
-- Links ASH ECID data to FND_CONCURRENT_REQUESTS
-- ============================================================
SELECT
    a.ecid,
    fcr.request_id,
    fcr.phase_code,
    fcr.status_code,
    fcr.actual_start_date,
    fcr.actual_completion_date,
    ROUND((fcr.actual_completion_date - fcr.actual_start_date) * 1440, 1)
                                           AS runtime_mins,
    fcp.concurrent_program_name,
    fcp.user_concurrent_program_name,
    fu.user_name                           AS submitted_by,
    fcr.oracle_session_id,
    a.sample_time,
    a.sql_id,
    a.event,
    a.wait_class,
    a.module
FROM       gv$active_session_history  a
JOIN       fnd_concurrent_requests    fcr
        ON fcr.oracle_session_id = a.session_id
       AND fcr.actual_start_date < a.sample_time
       AND NVL(fcr.actual_completion_date, SYSDATE) >= a.sample_time
JOIN       fnd_concurrent_programs    fcp
        ON fcp.concurrent_program_id = fcr.concurrent_program_id
       AND fcp.application_id        = fcr.program_application_id
LEFT JOIN  fnd_user                   fu
        ON fu.user_id = fcr.requested_by
WHERE  a.sample_time > SYSDATE - 12/24
  AND  a.ecid LIKE '%&ECID_EXTRACT%'
ORDER BY  a.sample_time DESC;

🤖 12. Query 10 — PL/SQL ECID Watchdog with Auto-Kill + Alert Log ADVANCED

Rather than reacting to incidents, this procedure runs proactively (via DBMS_SCHEDULER every 5 minutes) and auto-kills oacore sessions that have been active on the same ECID beyond a configurable threshold — with a full audit trail written to a custom log table.

-- ============================================================
-- Setup: Create the watchdog audit table (run once)
-- ============================================================
CREATE TABLE apps.ecid_watchdog_log (
    log_id          NUMBER         GENERATED ALWAYS AS IDENTITY,
    log_time        DATE          DEFAULT SYSDATE,
    ecid            VARCHAR2(64),
    sid             NUMBER,
    serial#         NUMBER,
    inst_id         NUMBER,
    event           VARCHAR2(64),
    module          VARCHAR2(64),
    last_call_et    NUMBER,
    sql_id          VARCHAR2(13),
    action_taken    VARCHAR2(20),
    kill_cmd        VARCHAR2(120),
    notes           VARCHAR2(500),
    CONSTRAINT ecid_wdg_pk PRIMARY KEY (log_id)
);

-- ============================================================
-- The Watchdog Procedure
-- ============================================================
CREATE OR REPLACE PROCEDURE apps.ecid_watchdog_proc (
    p_threshold_secs  IN NUMBER   DEFAULT 600,   -- 10-min default
    p_auto_kill       IN VARCHAR2 DEFAULT 'N',   -- 'Y' to kill, 'N' for log-only
    p_program_filter  IN VARCHAR2 DEFAULT '%oacore%'
)
AS
    l_kill_cmd  VARCHAR2(200);
    l_action    VARCHAR2(20);
BEGIN
    FOR r IN (
        SELECT
            s.ecid,
            s.sid,
            s.serial#,
            s.inst_id,
            s.event,
            s.module,
            s.sql_id,
            s.last_call_et,
            s.status,
            s.blocking_session,
            s.blocking_instance
        FROM   gv$session s
        WHERE  s.ecid         IS NOT NULL
          AND  s.program      LIKE p_program_filter
          AND  s.status       = 'ACTIVE'
          AND  s.last_call_et > p_threshold_secs
          AND  s.username     NOT IN ('SYS', 'SYSTEM')
          AND  s.type         = 'USER'
        ORDER BY s.last_call_et DESC
    )
    LOOP
        l_kill_cmd := 'ALTER SYSTEM KILL SESSION '''
                      || r.sid || ',' || r.serial# || ',@' || r.inst_id
                      || ''' IMMEDIATE';
        l_action := 'LOGGED';

        IF p_auto_kill = 'Y' THEN
            BEGIN
                EXECUTE IMMEDIATE l_kill_cmd;
                l_action := 'KILLED';
            EXCEPTION
                WHEN OTHERS THEN
                    l_action := 'KILL_FAILED:' || SUBSTR(SQLERRM, 1, 30);
            END;
        END IF;

        INSERT INTO apps.ecid_watchdog_log
            (ecid, sid, serial#, inst_id, event, module, last_call_et,
             sql_id, action_taken, kill_cmd, notes)
        VALUES
            (r.ecid, r.sid, r.serial#, r.inst_id, r.event, r.module,
             r.last_call_et, r.sql_id, l_action, l_kill_cmd,
             'blocker=' || r.blocking_session || '@' || r.blocking_instance
             || ' status=' || r.status);

        DBMS_OUTPUT.PUT_LINE(l_action || ': ECID=' || SUBSTR(r.ecid,1,30)
            || ' SID=' || r.sid || ' ET=' || r.last_call_et || 's');
    END LOOP;

    COMMIT;
END ecid_watchdog_proc;
/

-- ============================================================
-- Schedule as DBMS_SCHEDULER job (every 5 minutes)
-- ============================================================
BEGIN
    DBMS_SCHEDULER.CREATE_JOB (
        job_name        => 'APPS.ECID_WATCHDOG_JOB',
        job_type        => 'STORED_PROCEDURE',
        job_action      => 'APPS.ECID_WATCHDOG_PROC',
        number_of_arguments => 3,
        start_date      => SYSTIMESTAMP,
        repeat_interval => 'FREQ=MINUTELY;INTERVAL=5',
        enabled         => FALSE,
        auto_drop       => FALSE,
        comments        => 'ECID Watchdog - log-only by default'
    );
    -- Set args: 300s threshold, log-only mode, oacore filter
    DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE('APPS.ECID_WATCHDOG_JOB', 1, '300');
    DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE('APPS.ECID_WATCHDOG_JOB', 2, 'N');
    DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE('APPS.ECID_WATCHDOG_JOB', 3, '%oacore%');
    DBMS_SCHEDULER.ENABLE('APPS.ECID_WATCHDOG_JOB');
END;
/

-- Query the watchdog audit log
SELECT * FROM apps.ecid_watchdog_log
ORDER BY log_time DESC
FETCH FIRST 50 ROWS ONLY;
🚨 Production Safeguards for the Watchdog:
  • Always start with p_auto_kill = 'N' (log-only) for at least 1 week. Review the logs to ensure it would not kill legitimate long-running batch jobs.
  • Add exclusion ECIDs: prefix-match known batch ECIDs and skip them in the cursor WHERE clause.
  • Gate on blocking_session IS NULL if you only want to kill self-hung sessions, not sessions blocked by others (which need a different fix).
  • Set a daily count alert: if the watchdog kills >N sessions in 24 hours, it signals a recurring architectural problem — not a random hang.

🐚 13. Bash: Production-Grade ECID Extraction from oacore Logs

#!/bin/bash
# ============================================================
# ecid_extract.sh — Extract ECID from oacore WLS logs
# Usage: ./ecid_extract.sh [minutes_back] [ecid_fragment]
# ============================================================

DOMAIN_HOME="/u01/oracle/domains/EBSdomain"
OACORE_LOG="${DOMAIN_HOME}/servers/oacore/logs"
MINS_BACK=${1:-60}
ECID_FRAG=${2:-}

echo "=== ECID Extractor: Last ${MINS_BACK} minutes ==="

# Find log files modified in the last N minutes
find "${OACORE_LOG}" -name "*.log" -newer /tmp/.ecid_ref_${MINS_BACK} 2>/dev/null | \
xargs grep -h "ecid=" 2>/dev/null | \
grep -oP 'ecid=\K[^;,\] ]+' | \
sort -u | \
awk 'length($0) > 10' > /tmp/ecid_candidates.txt

echo "Found $(wc -l < /tmp/ecid_candidates.txt) unique ECIDs"

# If a fragment was passed, filter to just matching ECIDs
if [ -n "${ECID_FRAG}" ]; then
    grep "${ECID_FRAG}" /tmp/ecid_candidates.txt
else
    head -20 /tmp/ecid_candidates.txt
fi

# Extract stuck thread ECIDs from WLS thread dump
echo
echo "=== ECIDs from stuck thread dumps ==="
grep -B5 -A10 "STUCK" "${OACORE_LOG}/oacore_server1.out" 2>/dev/null | \
grep -oP '(?<=ECID-Context: )[^\]"]+' | \
sort -u

# Cross-reference with current time for correlation
echo
echo "=== Generating SQL WHERE clause ==="
echo "-- Paste this into Phase 3.1 / Q1-Q6 scripts:"
head -5 /tmp/ecid_candidates.txt | while read ecid; do
    echo "  AND s.ecid LIKE '%$(echo ${ecid} | cut -c1-20)%'"
done

🗺️ 14. Decision Framework — Which Query When

Situation Use This Why
Live incident, session still active Phase 3.1 (GV$SESSION) Fastest path to kill command. Zero ASH needed.
Session died before you could look Q2 (ASH Timeline) Reconstruct exactly what happened, in order, from ASH memory.
Incident >1 hour ago, ASH flushed Q7 (DBA_HIST_ASH) AWR holds 8 days. Join with DBA_HIST_SQLSTAT for SQL perf context.
"Which requests are heaviest right now?" Q3 (Top ECID Profile) STATS_MODE + wait breakdown surfaces the real offenders in one row per ECID.
"Which EBS user is it?" Q4 (FND User Mapping) CLIENT_ID → FND_USER join + FND_LOG cross-reference.
RAC: session active across multiple nodes Q5 (Cross-Node RAC) Reveals gc waits, cross-node blocking — invisible to V$SESSION on one node.
Recurring slow query (same SQL, varies in speed) Q6 (Plan Stability) Quantifies how much worse the bad plan is. Drives SPM/hint/profile solution.
Need to identify the locked table Q8 (Blocking Chain Tree) TM lock join to DBA_OBJECTS gives exact table name + chain depth.
ECID came from a Concurrent Request Q9 (CR Bridge) Links ASH ECID directly to REQUEST_ID + program + submitter.
Prevent future repeats, automate response Q10 (Watchdog PL/SQL) Scheduled every 5 minutes, log-only or auto-kill, full audit trail.
"Show me the timeline in a heat chart" Q1 (5-Min Bucket Heat) ASCII bar chart per 5-minute window — no external tool needed.

📝 15. Summary & RCA Template

Every ECID incident that you resolve should generate an RCA entry. Here is a minimal post-incident template built from the outputs of the queries above:

/* ============================================================
   RCA TEMPLATE — ECID Session Incident
   Copy into your incident ticket / Jira / SNOW
   ============================================================ */

-- 1. INCIDENT WINDOW
--    Start:     YYYY-MM-DD HH24:MI (extracted from Q1 heat map first_seen)
--    End:       YYYY-MM-DD HH24:MI
--    Duration:  NNN minutes

-- 2. ECID
--    Full ECID:     1.xxxxx...;kXjE
--    Source:        oacore_server1.out / access.log / FND_LOG_MESSAGES

-- 3. IMPACTED SESSION (from Phase 3.1 / Q2)
--    SID/Serial#/Inst: 247,8931,@2
--    FND User:         JSMITH (from Q4)
--    EBS Module:       OAF Page: PO/PORequisitionCreate (from MODULE column)
--    OS PID:           12847

-- 4. ROOT CAUSE
--    Dominant Event:   enq: TX - row lock contention (from Q3 STATS_MODE)
--    Blocker:          SID 189@1 (session on node 1, from Q8 chain tree)
--    Blocked by Table: PO_REQUISITION_LINES_ALL (TM lock, from Q8 locked_object)
--    SQL ID:           d8txf935msbx5 (dominant SQL from Q3)
--    Plan Hash:        1234567890 (stable / unstable — from Q6)

-- 5. ACTION TAKEN
--    ALTER SYSTEM KILL SESSION '247,8931,@2' IMMEDIATE;
--    Executed by:  [DBA Name], [Time]
--    User impact:  Session terminated; user re-submitted — successful

-- 6. PREVENTION
--    Short-term:   ECID Watchdog job enabled (log-only, 5-min interval)
--    Medium-term:  Analyse SQL d8txf935msbx5 for index opportunities
--    Long-term:    Review table locking strategy for PO_REQUISITION_LINES_ALL
--                  with functional team; consider row-level lock optimization
💡 Final Advanced Tip — Proactive ECID Baseline: Run Q3 (Top ECIDs Summary) daily during business hours for 2 weeks. Store the output. You will quickly build a baseline of which ECID patterns are normal (same module, same SQL, same wait class, consistent samples) vs anomalous (sudden spike in lock_samples, new plan_hash, cross-node blocking). This baseline turns reactive incident response into proactive capacity and tuning governance.

Oracle EBS: Hunting Rogue Sessions by ECID — The DBA's Precision Kill Guide

 

🔍 Oracle EBS: Hunting Rogue Sessions by ECID | appsdbastuff
🔍 Oracle EBS: Hunting Rogue Sessions by ECID — The DBA's Precision Kill Guide

You're in the middle of a production incident — a user's request is hung, the helpdesk ticket is escalating, and you need to surgically kill exactly that session without touching anything else. You could scroll through GV$SESSION manually. You could guess by username or machine. Or — you could use the Execution Context ID (ECID) to locate the exact session with zero ambiguity and generate the kill command in the same query.

This post is a deep-dive into a Phase 3.1 diagnostic script that does precisely that: ECID-based exact-match session hunting, filtered to oacore (the Oracle EBS managed server), with every forensic column you need to understand why the session is hung — not just where it is.

🧬 1. What is ECID and Why Does It Matter in EBS?

👶 Fresher

ECID — Execution Context Identifier — is a unique tag that Oracle middleware (WebLogic / OC4J) stamps onto every database session it opens on behalf of an application user request. Think of it as a transaction passport: it travels from the browser request through WebLogic, into the database connection, and gets stored in GV$SESSION.ECID.

🎓 Experienced

In Oracle EBS 12.2 with WebLogic 12c, the ECID is propagated via Oracle Diagnostic Logging (ODL) and is visible in both the WLS server logs (oacore managed server) and in GV$SESSION.ECID. When a user reports a hung page — and you have the WLS request log or an FND log reference — you extract the ECID from there and cross-match it to the DB session. This is far more reliable than matching on username/machine/IP, because in a connection-pooled environment like EBS, multiple sessions can belong to the same OS user from the same app tier host.

📌 Where to find the ECID for a specific user complaint:
1. FND: FND_LOG_MESSAGES — column EXECUTION_CONTEXT_ID
2. WLS Access Log: $DOMAIN_HOME/servers/oacore/logs/access.log — look for ecid=
3. WLS Diagnostic Log: oacore-diagnostic.log — search by username + timestamp
4. From the user: raise a support ticket and ask them to provide the URL request timestamp — you can backtrack from there.

📜 2. The Full Script — Phase 3.1

-- ============================================================
-- PHASE 3.1: FIND SESSION BY ECID (Exact Match)
-- ============================================================
SET LINES 200 PAGES 100
COL kill_cmd           FORMAT A60
COL program            FORMAT A25
COL module             FORMAT A30
COL action             FORMAT A30
COL ecid               FORMAT A55
COL event              FORMAT A40
COL sql_text           FORMAT A80

SELECT
    'ALTER SYSTEM KILL SESSION ''' || s.sid || ',' || s.serial# || ',@' || s.inst_id || ''' IMMEDIATE;'
        AS kill_cmd,
    s.inst_id,
    s.sid,
    s.serial#,
    s.status,
    s.state,
    TRUNC(s.last_call_et/60) || 'm ' || MOD(s.last_call_et, 60) || 's'  AS last_call_et,
    s.program,
    s.module,
    s.action,
    s.ecid,
    s.username,
    s.machine,
    s.process                                                            AS os_pid,
    s.sql_id,
    s.sql_child_number,
    s.prev_sql_id,
    s.event,
    s.wait_class,
    s.seconds_in_wait,
    s.blocking_session,
    s.blocking_instance,
    s.blocking_session_status
FROM  gv$session s
WHERE s.ecid    LIKE '%&ECID_EXTRACT%'
  AND   s.program LIKE '%oacore%'
ORDER BY s.last_call_et DESC;
⚠️ RAC Note: This script runs against GV$SESSION (Global View). In a RAC environment (e.g., Qualcomm's QCTEBSP04 / P05), the kill command already includes ,@inst_id so you can run it from any node and it will target the correct instance. No need to log in to the specific node.

🔬 3. Column-by-Column Breakdown

Column / Alias Source in GV$SESSION What It Tells You
kill_cmd Derived (SID + SERIAL# + INST_ID) Ready-to-execute kill statement. Copy → Paste → Done.
inst_id GV$SESSION.INST_ID RAC node number (1, 2, 3…). Critical for 2-node or 3-node RAC clusters.
sid / serial# SID, SERIAL# Together with inst_id, uniquely identify the session. SERIAL# prevents killing a recycled SID.
status STATUS ACTIVE = executing SQL; INACTIVE = idle (open cursor/lock held); KILLED = kill issued, OS cleanup pending.
state STATE WAITING = blocked on an event; WAITED SHORT TIME/WAITED KNOWN TIME = recently waited. Helps distinguish stuck from just slow.
last_call_et LAST_CALL_ET (seconds) How long since the last client call. Formatted as NNm NNs for readability. >5 minutes on ACTIVE = red flag.
program PROGRAM Connecting client binary. JDBC Thin Client = WebLogic pool; oracle@hostname = background/batch.
module MODULE Set by the app via DBMS_APPLICATION_INFO. In EBS: OAF page name, Form name, or Concurrent Program short name.
action ACTION Fine-grained action within the module (e.g., Query, Update, Commit). Extremely useful for PL/SQL debugging.
ecid ECID The execution context ID propagated from WebLogic. This is your match key for the user's request.
username USERNAME Oracle DB schema user. In EBS: typically APPS or the responsibility-owning schema. Not the FND user.
machine MACHINE Hostname of the connecting machine — usually the WebLogic app tier server (e.g., qcterpp05app1).
os_pid PROCESS OS process ID on the client side (WLS JVM thread). Useful if you need to correlate to a WLS thread dump.
sql_id / sql_child_number SQL_ID, SQL_CHILD_NUMBER Currently executing SQL. Feed into V$SQL for the plan, or AWR for historical behavior.
prev_sql_id PREV_SQL_ID Last SQL executed before the current one. On INACTIVE sessions this is often the real culprit SQL.
event EVENT The specific wait event (e.g., enq: TX - row lock contention, db file sequential read). The most important diagnostic column.
wait_class WAIT_CLASS High-level category: Concurrency, User I/O, Application, Network, Idle.
seconds_in_wait SECONDS_IN_WAIT How long it has been in the current wait event. Distinct from LAST_CALL_ET.
blocking_session / blocking_instance / blocking_session_status 3 columns If the session is blocked by another session, these identify the blocker. blocking_session_status = VALID means the blocker is alive and holding the lock.

⚡ 4. Understanding the Kill Command Construction

The script auto-builds the kill statement so you never construct it manually under pressure:

'ALTER SYSTEM KILL SESSION ''' || s.sid || ',' || s.serial# || ',@' || s.inst_id || ''' IMMEDIATE;'

This produces output like:

ALTER SYSTEM KILL SESSION '247,8931,@2' IMMEDIATE;
ComponentMeaning
247Session SID on instance 2
8931SERIAL# — distinguishes this session from any recycled SID 247
@2RAC Instance number — routes the kill to node 2 even if you're connected to node 1
IMMEDIATESkips the graceful wait — session is killed without waiting for it to respond. Use in stuck-session scenarios.
🚨 IMMEDIATE vs without IMMEDIATE:
Without IMMEDIATE, Oracle waits for the session to acknowledge the kill, which can take minutes if the session is stuck on a network/I-O wait. In production incidents, always use IMMEDIATE — but understand it still requires Oracle to rollback uncommitted work in the background. The session in GV$SESSION may remain with STATUS=KILLED for minutes while the rollback completes.

What Happens After the Kill?

-- Verify the session is gone (or in KILLED state rolling back)
SELECT sid, serial#, inst_id, status, last_call_et, username
FROM   gv$session
WHERE  sid = &SID
  AND  inst_id = &INST;
-- If STATUS=KILLED persists for >10 mins, the OS process may be stuck.
-- Escalate to OS-level kill: kill -9 <spid> (get SPID from V$PROCESS)

🎯 5. The WHERE Clause — ECID + oacore Filter

WHERE s.ecid    LIKE '%&ECID_EXTRACT%'
  AND s.program LIKE '%oacore%'

Why LIKE with % on both sides?

The ECID value in GV$SESSION is typically a composite string like:
56f8a7e1-2d14-4b78-9abc-...:1:1:oacore~0f3a
When you extract an ECID fragment from a WLS log or FND table, you often get a portion of the full string. The LIKE '%fragment%' pattern ensures you still match even with a partial extract. If you have the full ECID, switch to = '&ECID_EXACT' for better performance (avoids full table scan on large instances).

Why the oacore Filter?

In Oracle EBS 12.2, all OAF-based pages and self-service flows go through the oacore WebLogic managed server. Its JDBC pool connections appear in GV$SESSION.PROGRAM as something like JDBC Thin Client with MODULE populated from OAF, and critically the PROGRAM column carries oacore as a substring in some configurations.

💡 Pro Tip: If you're hunting sessions from the Forms tier (OAF's frmweb/forms services), change the filter to s.program LIKE '%frmweb%'. For Concurrent Manager sessions, filter on s.program LIKE '%FNDLIBR%' or join to FND_CONCURRENT_REQUESTS on ORACLE_SESSION_ID = s.audsid.

🕐 6. Decoding Wait Class & Event — What Is the Session Actually Doing?

WAIT_CLASSCommon EVENTLikely Root Cause in EBS
Application enq: TX - row lock contention Another session holds a lock. Check BLOCKING_SESSION.
Application enq: TM - contention Table-level lock — often a DDL or bulk DML running concurrently.
Concurrency library cache lock Shared pool contention — often seen during ADOP patching or mass recompile.
User I/O db file sequential read Single-block I/O — index lookup on a slow disk or high I/O load. Check SQL plan.
User I/O db file scattered read Multi-block I/O — full table scan. Check for missing index or bad plan.
Network SQL*Net message from client Idle — waiting for next request from WebLogic. Normal for pooled connections.
Idle jobq slave wait Scheduler slave idle. Ignore in session analysis.
Other wait for unread message on broadcast channel RAC global cache background — normal RAC operation, not a problem.
📎 Once you have EVENT + SQL_ID, your next step:
SELECT * FROM gv$sql WHERE sql_id = '&SQL_ID';
SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('&SQL_ID', &CHILD, 'ALLSTATS LAST'));
These reveal the full SQL text and execution plan — confirming whether the hang is a lock wait, a runaway full-scan, or something deeper.

🔗 7. Blocking Session Chain Analysis

The three blocking columns — blocking_session, blocking_instance, blocking_session_status — tell you whether your target session is itself being blocked. If blocking_session_status = 'VALID', you have a blocker–waiter chain. In that case, killing the waiter alone won't fix the problem — you need to kill the blocker first (or trace why it's holding the lock).

-- Follow the blocking chain from your ECID session outward
SELECT
    LPAD(' ', LEVEL*2) || s.sid                AS session_tree,
    s.inst_id,
    s.serial#,
    s.status,
    s.event,
    s.sql_id,
    s.blocking_session,
    s.blocking_instance,
    s.username,
    s.module
FROM   gv$session s
START WITH  s.ecid LIKE '%&ECID_EXTRACT%'
CONNECT BY  PRIOR s.blocking_session = s.sid
        AND PRIOR s.blocking_instance = s.inst_id
ORDER SIBLINGS BY s.last_call_et DESC;
⚠️ Cross-instance blocking in RAC: A session on node 1 can be blocked by a session on node 2. The blocking_instance column captures this. The hierarchical query above handles cross-instance chains correctly because it joins on both SID and INST_ID.

🏗️ 8. RAC Awareness — Why GV$SESSION, Not V$SESSION

👶 Fresher

In a single-instance database, V$SESSION shows all sessions. In a RAC cluster, each node has its own V$SESSION — it only shows sessions on that node. GV$SESSION is the Global View that spans all nodes. The G prefix on any V$ view gives you the cluster-wide picture.

🎓 Experienced

On a 2-node or 3-node RAC, EBS WebLogic connection pools distribute sessions across nodes via load balancing. The ECID you extract from a WLS log does not tell you which node the corresponding DB session landed on. Running the query against GV$SESSION finds it regardless of which node it's on, and the inst_id column in the output — plus the ,@inst_id in the kill command — ensures the kill is routed correctly without you having to log in to a specific node.

-- Quick sanity: how many EBS oacore sessions per node right now?
SELECT   inst_id,
         COUNT(*) AS session_count,
         SUM(CASE WHEN status='ACTIVE' THEN 1 ELSE 0 END) AS active,
         SUM(CASE WHEN status='INACTIVE' THEN 1 ELSE 0 END) AS inactive
FROM     gv$session
WHERE    program LIKE '%oacore%'
GROUP BY inst_id
ORDER BY inst_id;

🔄 9. End-to-End Incident Workflow

┌──────────────────────────────────────────────────────────────────────────┐
│  USER REPORTS: "Page is hung / spinner running for 20 minutes"           │
└────────────────────────┬─────────────────────────────────────────────────┘

              STEP 1: Get the ECID
              ├─ From user: approx time + URL + responsibility
              ├─ From FND: SELECT execution_context_id FROM fnd_log_messages
              │            WHERE timestamp BETWEEN ... AND ...
              │            AND  module LIKE '%<page_name>%';
              └─ From WLS: grep 'ecid=' $DOMAIN/servers/oacore/logs/access.log

              STEP 2: Run Phase 3.1 Script
              └─ Supply ECID_EXTRACT at the prompt
                 → Confirms: SID, SERIAL#, INST_ID, EVENT, BLOCKING?

              STEP 3: Assess
              ├─ Is session ACTIVE or INACTIVE?
              ├─ What is the EVENT and WAIT_CLASS?
              ├─ Is BLOCKING_SESSION_STATUS = VALID?
              └─ How long is LAST_CALL_ET and SECONDS_IN_WAIT?

              STEP 4: Decision
              ├─ Blocked by another session? → Kill blocker first
              ├─ Long-running SQL (bad plan)? → Capture SQL_ID, check plan
              ├─ Network idle (SQL*Net from client)? → WLS thread issue, not DB
              └─ Confirmed stuck? → Execute kill_cmd from query output

              STEP 5: Verify + Document
              ├─ Re-run query → session should be gone or STATUS=KILLED
              ├─ Confirm user can re-access the page
              └─ Log RCA in incident ticket (ECID, SID, EVENT, action taken)
└──────────────────────────────────────────────────────────────────────────┘

✅ 10. Safety Checklist Before You Kill

🚨 Never kill a session without running through this list:
  1. Is the session ACTIVE with a legitimate long-running batch (Quarter-end, month-end, payroll)?
  2. Is the session part of an ADOP patching cutover phase (fs_clone, generate, finalize)?
  3. Does the MODULE/ACTION indicate a Concurrent Request? (FND_CONCURRENT_REQUESTS check)
  4. Is it a background Oracle process (PMON, SMON, LGWR)? Never kill these.
  5. Have you confirmed with the user/business that the session can be terminated?
  6. Are you logged in as SYSDBA or a user with ALTER SYSTEM privilege?
  7. Is this a production instance? If yes — get approval before proceeding.

🛠️ 11. Script Extensions & Variants

Extension A: Include SQL Text Inline

-- Add this to the SELECT to see current SQL without a separate query:
SELECT
    ...,
    (SELECT SUBSTR(sql_text, 1, 200)
       FROM  gv$sql q
      WHERE  q.sql_id    = s.sql_id
        AND  q.inst_id   = s.inst_id
        AND  q.child_number = s.sql_child_number
        AND  rownum = 1)  AS curr_sql_text,
    (SELECT SUBSTR(sql_text, 1, 200)
       FROM  gv$sql q
      WHERE  q.sql_id   = s.prev_sql_id
        AND  q.inst_id  = s.inst_id
        AND  rownum = 1)  AS prev_sql_text
FROM  gv$session s
WHERE s.ecid LIKE '%&ECID_EXTRACT%'
  AND s.program LIKE '%oacore%';

Extension B: FND_LOG_MESSAGES Cross-Reference

-- Map an FND log entry directly to the live DB session
SELECT
    fl.execution_context_id    AS ecid,
    fl.timestamp,
    fl.message_text,
    s.sid,
    s.serial#,
    s.inst_id,
    s.status,
    s.event,
    s.sql_id
FROM   fnd_log_messages  fl
JOIN   gv$session        s  ON s.ecid = fl.execution_context_id
WHERE  fl.execution_context_id LIKE '%&ECID_EXTRACT%'
   AND  s.program LIKE '%oacore%'
ORDER BY fl.timestamp DESC;

Extension C: Batch Kill All oacore Sessions Idle > 30 Minutes

-- USE WITH EXTREME CAUTION — Only in non-production or after business approval
BEGIN
  FOR r IN (
    SELECT inst_id, sid, serial#
    FROM   gv$session
    WHERE  program      LIKE '%oacore%'
      AND  status       = 'INACTIVE'
      AND  last_call_et > 1800  -- 30 minutes
  ) LOOP
    EXECUTE IMMEDIATE
      'ALTER SYSTEM KILL SESSION '''
      || r.sid || ',' || r.serial# || ',@' || r.inst_id
      || ''' IMMEDIATE';
  END LOOP;
END;
/

📝 12. Summary

What the Script DoesWhy It Matters
Matches session by ECID from WLS/FNDUnique identification — no ambiguity in connection-pooled EBS environments
Filters to oacore programLimits scope to OAF/self-service layer; excludes batch, forms, CM
Generates kill_cmd inlineZero manual effort — copy and execute under pressure
Includes inst_id in kill commandRAC-safe: works from any node in a multi-node cluster
Exposes EVENT + WAIT_CLASS + SECONDS_IN_WAITDistinguishes lock wait from I/O from idle — guides the right action
Exposes BLOCKING_SESSION + STATUSReveals if killing the waiter alone is insufficient
Shows MODULE + ACTION + SQL_IDFull forensic trail for RCA and post-incident documentation
Sorts by LAST_CALL_ET DESCWorst offenders (longest hung) surface immediately

ECID-based session targeting is one of those techniques that separates reactive firefighting from precise, professional incident management. Combined with a solid understanding of wait events and blocking chains, this script gives you everything you need to go from a user complaint to a resolved incident — with full auditability — in under five minutes.