Oracle EBS Long Running Concurrent Request RCA: CPU Bound SQL, TEMP Spill, and Stale Statistics

1. Issue Summary

A long-running Oracle EBS concurrent request was reported during production processing. The request was active for several hours and required DBA investigation to identify whether the delay was due to storage, blocking, CPU, TEMP usage, stale statistics, or execution plan inefficiency.

2. Initial Concurrent Request Monitoring

Column	Sample Value
Request ID	`<REQUEST_ID>`
Program Name	`<CONCURRENT_PROGRAM_NAME>`
SID	`<SID>`
SQL_ID	`<SQL_ID>`
Wait Event	`db file sequential read`
Status	ACTIVE
Runtime	Several hours

At first glance, the wait event appeared to indicate single-block read activity.

3. Step 1: Check Current Session Wait


SELECT inst_id,
       sid,
       event,
       wait_class,
       p1text,
       p1,
       p2text,
       p2,
       p3text,
       p3,
       seconds_in_wait,
       state
FROM gv$session_wait
WHERE sid = <SID>;

Sample Output

EVENT	WAIT_CLASS	P1TEXT	P1	P2TEXT	P2	STATE
db file sequential read	User I/O	file#	xxx	block#	xxxxx	WAITED SHORT TIME

Initial Interpretation

db file sequential read normally indicates single-block reads through index access or rowid lookup.

However, one current wait event alone does not prove the root cause.

4. Step 2: Check ASH Wait Profile


SELECT event,
       wait_class,
       sql_id,
       COUNT(*) samples
FROM gv$active_session_history
WHERE session_id = <SID>
AND sample_time > SYSDATE - 1/24
GROUP BY event, wait_class, sql_id
ORDER BY samples DESC;

Sample Output

Event	Wait Class	SQL_ID	Samples
null / ON CPU	null	`<SQL_ID>`	3574
direct path write temp	User I/O	`<SQL_ID>`	18
db file sequential read	User I/O	`<SQL_ID>`	3

Finding

The SQL was primarily CPU-bound and not storage-bound.

5. Step 3: Check Parallel Execution


SELECT sql_id,
       px_servers_executions,
       executions,
       elapsed_time/1000000 elapsed_sec
FROM gv$sql
WHERE sql_id = '<SQL_ID>';

Sample Output

SQL_ID	PX_SERVERS_EXECUTIONS	EXECUTIONS	ELAPSED_SEC
`<SQL_ID>`	0	2	36863

Finding

The SQL executed completely in serial mode.

6. Step 4: Execution Plan Analysis


SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
  sql_id => '<SQL_ID>',
  cursor_child_no => NULL,
  format => 'ALLSTATS LAST +MEMSTATS +PARALLEL +PEEKED_BINDS'
));

Sample Explain Plan Output


--------------------------------------------------------------------------------------------
| Id | Operation                            | Name                    | A-Rows | Used-Tmp |
--------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                     |                         |        |           |
|  1 |  SORT UNIQUE                         |                         |      1 |           |
|  2 |   VIEW                               |                         |      1 |           |
|  3 |    UNION ALL (RECURSIVE WITH)        |                         |        |           |
|  4 |     TABLE ACCESS BY INDEX ROWID      | ***_LEVELS_TBL          |      8 |           |
|  5 |      INDEX RANGE SCAN                | ***_LEVELS_N3           |     45 |           |
|  6 |     HASH JOIN                        |                         |      1 | 24G       |
|  7 |      RECURSIVE WITH PUMP             |                         |        |           |
|  8 |      BUFFER SORT (REUSE)             |                         |   5600 |           |
|  9 |       TABLE ACCESS BY INDEX ROWID    | ***_LEVELS_TBL          |    700 |           |
| 10 |        INDEX RANGE SCAN              | ***_LEVELS_N3           |    851 |           |
--------------------------------------------------------------------------------------------

Important Execution Plan Findings

A. Recursive WITH Processing


UNION ALL (RECURSIVE WITH)
RECURSIVE WITH PUMP

Meaning

The SQL is performing recursive hierarchy expansion.

This type of SQL is commonly used in:

BOM explosion
Planning hierarchy
Recursive validation
Cost rollup
Parent-child hierarchy traversal

Recursive queries can exponentially increase intermediate rowsets.

B. HASH JOIN Operation


HASH JOIN
Used-Tmp : 24G

Meaning

Oracle selected a HASH JOIN strategy.

However, the HASH JOIN workarea overflowed and spilled approximately:


24 GB TEMP

This caused:

heavy TEMP usage
CPU-intensive processing
long runtime

C. TABLE ACCESS BY INDEX ROWID


TABLE ACCESS BY INDEX ROWID

Meaning

Oracle used index-based row retrieval rather than full table scan.

This explains the occasional:


db file sequential read

seen in wait events.

However, this was not the root cause of the slowdown.

D. INDEX RANGE SCAN


INDEX RANGE SCAN

Meaning

Oracle used selective index access paths.

The issue was not missing indexes, but inefficient join/workarea processing caused by optimizer misestimation.

E. Serial Execution


PX_SERVERS_EXECUTIONS = 0

Meaning

The SQL executed entirely in serial mode.

Large recursive hierarchy processing in serial mode significantly increased elapsed runtime.

7. Step 5: Check PGA Configuration


SHOW PARAMETER pga;

Sample Output

Parameter	Value
pga_aggregate_target	80G
pga_aggregate_limit	0

Finding

PGA was already configured with a large target.

Therefore the issue was not simply low PGA allocation.

8. Step 6: Validate Table Statistics


SELECT owner,
       table_name,
       num_rows,
       last_analyzed,
       stale_stats
FROM dba_tab_statistics
WHERE owner = '<OWNER>'
AND table_name = '<TABLE_NAME>';

Sample Output

OWNER	TABLE_NAME	NUM_ROWS	LAST_ANALYZED	STALE_STATS
`<OWNER>`	`<TABLE_NAME>`	89	`<DATE>`	YES

9. Step 7: Validate Actual Row Count


SELECT COUNT(*)
FROM <OWNER>.<TABLE_NAME>;

Sample Output

COUNT(*)
1347

Critical Observation

Source	Row Count
Optimizer statistics	89
Actual table rows	1347

Optimizer statistics were stale and underestimated table cardinality.

10. Root Cause Chain


Stale statistics
→ wrong cardinality estimate
→ optimizer underestimated rows
→ inefficient HASH JOIN strategy
→ insufficient workarea estimation
→ HASH JOIN TEMP spill (~24GB)
→ CPU-intensive recursive processing
→ long serial execution runtime

11. Final RCA Statement

The concurrent request experienced prolonged runtime due to inefficient execution plan selection for a recursive hierarchy SQL.

Execution plan analysis identified recursive WITH processing combined with HASH JOIN operations. The HASH JOIN spilled significantly to TEMP (~24GB), causing CPU-intensive execution.

Further validation confirmed stale optimizer statistics on one of the key processing tables. Optimizer statistics estimated only 89 rows while actual table count was 1347. This cardinality misestimation caused the optimizer to underestimate recursive row expansion and allocate insufficient HASH JOIN workarea memory.

ASH analysis confirmed the session was predominantly CPU-bound with minimal storage waits, indicating the issue was related to execution plan inefficiency rather than storage subsystem latency.

12. Immediate Mitigation

Gather Fresh Table Statistics


EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname          => '<OWNER>',
  tabname          => '<TABLE_NAME>',
  estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
  cascade          => TRUE,
  degree           => 4,
  method_opt       => 'FOR ALL COLUMNS SIZE AUTO'
);

13. Preventive Mitigation

Recommended Operational Flow


Data load/interface refresh
→ Gather targeted table statistics
→ Execute critical concurrent request
→ Validate execution plan stability

Proactive Stale Stats Validation


SELECT owner,
       table_name,
       stale_stats,
       last_analyzed,
       num_rows
FROM dba_tab_statistics
WHERE owner = '<OWNER>'
AND table_name = '<TABLE_NAME>';

14. Best Practice Recommendation

For volatile Oracle EBS interface/staging/recursive processing tables:

Do NOT rely only on weekly schema statistics.

Implement:

targeted post-load stats gathering
month-end validation checks
execution plan monitoring
stale stats validation before critical requests

15. Customer-Facing Summary

The delay was caused by stale optimizer statistics on a key processing table, leading to incorrect cardinality estimates and inefficient recursive HASH JOIN processing. The SQL spilled heavily to TEMP during hierarchy expansion, resulting in CPU-intensive serial execution and extended runtime. Targeted statistics gathering after data load and before critical concurrent processing is recommended to prevent recurrence.

Wednesday, May 13, 2026