Deep Dive RCA: Completed Long-Running Concurrent Request in Oracle EBS R12.2 / 19c
1. RCA Objective
For a completed concurrent request, your goal is to prove:
- When the request started and completed
- Which database session handled it
- Which SQL_IDs were active during that runtime
- Which SQL consumed most time
-
Why it consumed time
- Plan change?
- Full table scan?
- High I/O?
- CPU?
- Locking?
- TEMP spill?
- Parallelism issue?
- Data volume increase?
- Stale statistics?
- What corrective or preventive action is recommended
2. Get Concurrent Request Basic Details
SELECT
fcr.request_id,
fcpt.user_concurrent_program_name,
fcp.concurrent_program_name short_name,
fcr.request_date,
fcr.requested_start_date,
fcr.actual_start_date,
fcr.actual_completion_date,
ROUND((fcr.actual_completion_date - fcr.actual_start_date) * 24 * 60, 2) duration_minutes,
fcr.phase_code,
fcr.status_code,
fcr.completion_text,
fcr.oracle_process_id spid,
fcr.oracle_session_id audsid,
fcr.controlling_manager,
fcr.argument_text
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs fcp
ON fcp.concurrent_program_id = fcr.concurrent_program_id
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
WHERE fcr.request_id = &request_id;
What to capture
| Evidence | Why important |
|---|---|
actual_start_date | Start of investigation window |
actual_completion_date | End of investigation window |
duration_minutes | Confirms abnormal delay |
oracle_session_id | Helps map to database ASH session |
oracle_process_id | OS process ID |
argument_text | Different parameters may explain delay |
completion_text | Any warning/error message |
3. Compare Against Normal Successful Runs
Before blaming SQL, compare the same program’s past runs.
SELECT
fcr.request_id,
fcr.actual_start_date,
fcr.actual_completion_date,
ROUND((fcr.actual_completion_date - fcr.actual_start_date) * 24 * 60, 2) duration_minutes,
fcr.phase_code,
fcr.status_code,
fcr.argument_text
FROM apps.fnd_concurrent_requests fcr
WHERE fcr.concurrent_program_id = (
SELECT concurrent_program_id
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
)
AND fcr.actual_start_date IS NOT NULL
AND fcr.actual_completion_date IS NOT NULL
ORDER BY fcr.actual_start_date DESC
FETCH FIRST 20 ROWS ONLY;
RCA angle
If previous runs finished in 2 minutes and this run took 4 hours, then compare:
| Compare Item | Check |
|---|---|
| Arguments | Same parameters or wider date range? |
| Runtime window | Month-end? Peak batch time? |
| Manager | Same concurrent manager? |
| Instance | Same RAC node? |
| SQL plan | Same or changed plan hash? |
| Data volume | More rows processed? |
4. Find Database Session from Completed Request
For completed requests, the session is no longer in gv$session. So use historical ASH.
SELECT
fcr.request_id,
fcr.oracle_session_id audsid,
fcr.oracle_process_id spid,
fcr.actual_start_date,
fcr.actual_completion_date
FROM apps.fnd_concurrent_requests fcr
WHERE fcr.request_id = &request_id;
Now search ASH using session_serial#, session_id, module/action, and time window.
5. Find SQL_IDs from ASH During Request Runtime
This is one of the most important RCA queries.
SELECT
ash.sql_id,
ash.sql_plan_hash_value,
COUNT(*) ash_samples,
ROUND(COUNT(*) / 60, 2) approx_active_minutes,
MIN(ash.sample_time) first_seen,
MAX(ash.sample_time) last_seen,
ash.session_id,
ash.session_serial#,
ash.instance_number,
ash.module,
ash.action,
ash.program,
ash.wait_class,
ash.event
FROM dba_hist_active_sess_history ash
WHERE ash.sample_time BETWEEN
(SELECT actual_start_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND (SELECT actual_completion_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND ash.session_id IN (
SELECT oracle_session_id
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
)
GROUP BY
ash.sql_id,
ash.sql_plan_hash_value,
ash.session_id,
ash.session_serial#,
ash.instance_number,
ash.module,
ash.action,
ash.program,
ash.wait_class,
ash.event
ORDER BY ash_samples DESC;
How to read this
| Result | Meaning |
|---|---|
Highest ASH_SAMPLES | SQL where most active time was spent |
High User I/O | I/O bottleneck, full scan, index access issue |
High CPU | Expensive execution plan or high row processing |
High Concurrency | Buffer busy, latch, library cache issue |
High Application | Locking/enqueue issue |
High Cluster | RAC block transfer / GC waits |
High Temp related waits | Sort/hash spill issue |
6. Get SQL Text for Top SQL_IDs
SELECT
sql_id,
DBMS_LOB.SUBSTR(sql_text, 4000, 1) sql_text
FROM dba_hist_sqltext
WHERE sql_id = '&sql_id';
For current cursor cache, if still available:
SELECT
inst_id,
sql_id,
child_number,
plan_hash_value,
parsing_schema_name,
module,
action,
executions,
elapsed_time / 1000000 elapsed_sec,
cpu_time / 1000000 cpu_sec,
buffer_gets,
disk_reads,
rows_processed,
sql_text
FROM gv$sql
WHERE sql_id = '&sql_id';
7. SQL-Level AWR Statistics
Use this to prove whether the SQL was expensive during the issue window.
SELECT
sn.begin_interval_time,
sn.end_interval_time,
st.instance_number,
st.sql_id,
st.plan_hash_value,
st.executions_delta execs,
ROUND(st.elapsed_time_delta / 1000000, 2) elapsed_sec,
ROUND(st.cpu_time_delta / 1000000, 2) cpu_sec,
ROUND(st.iowait_delta / 1000000, 2) io_wait_sec,
ROUND(st.clwait_delta / 1000000, 2) cluster_wait_sec,
ROUND(st.apwait_delta / 1000000, 2) app_wait_sec,
st.buffer_gets_delta,
st.disk_reads_delta,
st.rows_processed_delta,
ROUND(st.elapsed_time_delta / 1000000 / NULLIF(st.executions_delta, 0), 2) avg_elapsed_sec
FROM dba_hist_sqlstat st
JOIN dba_hist_snapshot sn
ON sn.snap_id = st.snap_id
AND sn.dbid = st.dbid
AND sn.instance_number = st.instance_number
WHERE st.sql_id = '&sql_id'
AND sn.begin_interval_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
ORDER BY sn.begin_interval_time;
Interpretation
| Symptom | RCA direction |
|---|---|
| High elapsed + high CPU | Bad plan / high logical reads / inefficient joins |
| High elapsed + high I/O wait | Physical reads / full table scan / storage issue |
| High elapsed + high cluster wait | RAC GC waits / block contention |
| High elapsed + high application wait | Locking or enqueue |
| High buffer gets | Inefficient logical reads |
| High disk reads | I/O heavy SQL |
| High rows processed | Data volume issue |
| Different plan hash | Plan regression |
8. Wait Class Breakdown for the Request
SELECT
ash.wait_class,
ash.event,
COUNT(*) samples,
ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 2) pct
FROM dba_hist_active_sess_history ash
WHERE ash.sample_time BETWEEN
(SELECT actual_start_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND (SELECT actual_completion_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND ash.sql_id = '&sql_id'
GROUP BY ash.wait_class, ash.event
ORDER BY samples DESC;
Example RCA wording
| Evidence | RCA statement |
|---|---|
80% db file scattered read | SQL spent most time doing multiblock reads, indicating full table scan or large segment scan |
| 70% CPU | SQL was CPU-bound due to high logical reads or inefficient execution plan |
60% enq: TX - row lock contention | Request was delayed due to row lock contention from another session |
50% direct path read temp | SQL spilled to TEMP due to large sort/hash operation |
9. Find Blocking Sessions During the Runtime
SELECT
sample_time,
instance_number,
session_id,
session_serial#,
blocking_session,
blocking_session_serial#,
blocking_inst_id,
event,
wait_class,
sql_id,
current_obj#
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND session_id = &sid
AND blocking_session IS NOT NULL
ORDER BY sample_time;
If blocking is found:
SELECT
blocking_session,
blocking_session_serial#,
blocking_inst_id,
COUNT(*) blocking_samples,
MIN(sample_time) first_blocked,
MAX(sample_time) last_blocked
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND session_id = &sid
AND blocking_session IS NOT NULL
GROUP BY blocking_session, blocking_session_serial#, blocking_inst_id
ORDER BY blocking_samples DESC;
RCA wording
The concurrent request was not delayed due to SQL plan regression. ASH shows the session was waiting on row lock contention for approximately XX minutes, blocked by session SID XXX on instance X.
10. Check Plan History for SQL_ID
SELECT
sql_id,
plan_hash_value,
COUNT(*) awr_snap_count,
MIN(snap_id) first_snap,
MAX(snap_id) last_snap
FROM dba_hist_sqlstat
WHERE sql_id = '&sql_id'
GROUP BY sql_id, plan_hash_value
ORDER BY awr_snap_count DESC;
If multiple plan hash values exist
That is a strong indicator of possible plan instability.
Now compare performance by plan:
SELECT
st.sql_id,
st.plan_hash_value,
COUNT(*) snap_count,
SUM(st.executions_delta) total_execs,
ROUND(SUM(st.elapsed_time_delta) / 1000000, 2) total_elapsed_sec,
ROUND(SUM(st.cpu_time_delta) / 1000000, 2) total_cpu_sec,
SUM(st.buffer_gets_delta) total_buffer_gets,
SUM(st.disk_reads_delta) total_disk_reads,
ROUND(
SUM(st.elapsed_time_delta) / 1000000 / NULLIF(SUM(st.executions_delta), 0),
2
) avg_elapsed_per_exec_sec
FROM dba_hist_sqlstat st
WHERE st.sql_id = '&sql_id'
GROUP BY st.sql_id, st.plan_hash_value
ORDER BY avg_elapsed_per_exec_sec DESC;
RCA wording
The SQL used plan hash value
123456789during the delayed run. Historical AWR shows the normal plan hash value was987654321, with average elapsed time of 2 seconds per execution. The delayed plan averaged 240 seconds per execution, indicating SQL plan regression.
11. Display Execution Plan from AWR
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', NULL, NULL, 'ALLSTATS LAST +PEEKED_BINDS +OUTLINE +ALIAS'));
If the specific plan hash is known:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', &plan_hash_value, NULL, 'ALL +OUTLINE +PEEKED_BINDS'));
Check inside the plan
| Plan Area | What to check |
|---|---|
| Full table scan | Was large table scanned? |
| Index access | Was correct index used? |
| Join method | Nested loop vs hash join |
| Cardinality | Estimated rows vs actual rows |
| TEMP | Sort/hash operation |
| Parallelism | PX operators present or not |
| Partition pruning | Did it prune partitions? |
| Predicate section | Was filter pushed correctly? |
12. Check If SQL Was Captured in AWR or Not
Sometimes the SQL_ID is not in AWR. Reasons:
- SQL ran for short bursts but many times
- SQL was not in top SQL during snapshot
- AWR retention/snapshot missed it
- SQL aged out
- Dynamic SQL changed SQL_ID
- Literal values caused multiple SQL_IDs
- ASH has samples but SQLSTAT does not
Check ASH first:
SELECT
sql_id,
COUNT(*) ash_samples,
MIN(sample_time) first_seen,
MAX(sample_time) last_seen
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND sql_id = '&sql_id'
GROUP BY sql_id;
Then check AWR SQLSTAT:
SELECT
sql_id,
COUNT(*)
FROM dba_hist_sqlstat
WHERE sql_id = '&sql_id'
GROUP BY sql_id;
If SQL_ID is in ASH but not AWR
Use ASH evidence:
SELECT
sql_id,
sql_plan_hash_value,
wait_class,
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY sql_id, sql_plan_hash_value, wait_class, event
ORDER BY samples DESC;
RCA wording:
The SQL_ID was not available in DBA_HIST_SQLSTAT, likely because it was not part of the top SQL captured in AWR SQL statistics. However, DBA_HIST_ACTIVE_SESS_HISTORY captured active samples for this SQL_ID during the request runtime, and the ASH evidence shows the SQL spent most active time on
User I/O.
13. Check Object Behind Waits
From ASH, get object number:
SELECT
ash.current_obj#,
obj.owner,
obj.object_name,
obj.object_type,
COUNT(*) samples
FROM dba_hist_active_sess_history ash
LEFT JOIN dba_objects obj
ON obj.object_id = ash.current_obj#
WHERE ash.sql_id = '&sql_id'
AND ash.sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY ash.current_obj#, obj.owner, obj.object_name, obj.object_type
ORDER BY samples DESC;
RCA wording
ASH shows that most I/O wait samples were associated with object
XX.XX_TABLE, indicating the SQL spent significant time reading this table.
14. Check Segment Size
SELECT
owner,
segment_name,
segment_type,
ROUND(bytes / 1024 / 1024 / 1024, 2) size_gb,
tablespace_name
FROM dba_segments
WHERE owner = '&owner'
AND segment_name = '&object_name';
If it is a large table, check partitioning:
SELECT
table_owner,
table_name,
partition_name,
high_value,
num_rows,
blocks,
last_analyzed
FROM dba_tab_partitions
WHERE table_owner = '&owner'
AND table_name = '&table_name'
ORDER BY partition_position;
15. Check Table and Index Statistics
SELECT
owner,
table_name,
num_rows,
blocks,
sample_size,
last_analyzed,
stale_stats
FROM dba_tab_statistics
WHERE owner = '&owner'
AND table_name = '&table_name';
Index stats:
SELECT
owner,
index_name,
table_name,
blevel,
leaf_blocks,
distinct_keys,
clustering_factor,
num_rows,
last_analyzed,
stale_stats
FROM dba_ind_statistics
WHERE owner = '&owner'
AND table_name = '&table_name'
ORDER BY index_name;
RCA wording
Table statistics were last gathered on DD-MON-YYYY, but the data volume increased significantly during month-end. Stale or non-representative statistics may have contributed to incorrect optimizer cardinality and suboptimal plan selection.
16. Check if Full Table Scan Happened
From plan:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', &plan_hash_value, NULL, 'ALL'));
Look for:
TABLE ACCESS FULL
INDEX FULL SCAN
HASH JOIN
SORT ORDER BY
TEMP TABLE TRANSFORMATION
Also from ASH:
SELECT
event,
wait_class,
COUNT(*) samples
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND event IN (
'db file scattered read',
'direct path read',
'direct path read temp',
'db file sequential read'
)
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY event, wait_class
ORDER BY samples DESC;
Interpretation
| Wait | Meaning |
|---|---|
db file scattered read | Multiblock read, often full scan |
direct path read | Large direct reads, often full scan |
db file sequential read | Index lookup or single-block read |
direct path read temp | TEMP spill |
17. Check TEMP Usage Evidence
For historical completed issue, use ASH event evidence:
SELECT
sql_id,
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND event LIKE '%temp%'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY sql_id, event
ORDER BY samples DESC;
For currently running SQL:
SELECT
s.inst_id,
s.sid,
s.serial#,
s.username,
s.sql_id,
u.tablespace,
ROUND(u.blocks * ts.block_size / 1024 / 1024, 2) temp_mb
FROM gv$sort_usage u
JOIN gv$session s
ON s.saddr = u.session_addr
AND s.inst_id = u.inst_id
JOIN dba_tablespaces ts
ON ts.tablespace_name = u.tablespace
ORDER BY temp_mb DESC;
18. Check RAC GC Waits
SELECT
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND wait_class = 'Cluster'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY event
ORDER BY samples DESC;
RCA wording
The SQL experienced significant RAC cluster waits, mainly
gc buffer busy acquire, indicating global cache contention between RAC instances.
19. Check Concurrent Manager Load
Sometimes the delay is not SQL. It may be waiting before execution due to manager backlog.
SELECT
request_id,
requested_start_date,
actual_start_date,
ROUND((actual_start_date - requested_start_date) * 24 * 60, 2) pending_minutes,
actual_completion_date,
ROUND((actual_completion_date - actual_start_date) * 24 * 60, 2) running_minutes,
phase_code,
status_code
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id;
Interpret
| Case | Meaning |
|---|---|
| High pending time | Concurrent manager backlog |
| High running time | DB/program execution issue |
| Both high | Manager backlog + SQL/runtime issue |
Check manager:
SELECT
concurrent_queue_name,
target_node,
max_processes,
running_processes,
control_code
FROM apps.fnd_concurrent_queues
WHERE concurrent_queue_id = (
SELECT controlling_manager
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
);
20. Map Currently Running SID to Concurrent Request
For future live babysitting:
SELECT
fcr.request_id,
fcpt.user_concurrent_program_name,
fcr.phase_code,
fcr.status_code,
fcr.actual_start_date,
s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.sql_child_number,
s.event,
s.wait_class,
s.seconds_in_wait,
s.module,
s.action,
p.spid
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
JOIN gv$session s
ON s.audsid = fcr.oracle_session_id
JOIN gv$process p
ON p.addr = s.paddr
AND p.inst_id = s.inst_id
WHERE fcr.phase_code = 'R'
AND fcr.request_id = &request_id;
21. Live Monitoring Query for Long-Running Concurrent Requests
SELECT
fcr.request_id,
fcpt.user_concurrent_program_name,
ROUND((SYSDATE - fcr.actual_start_date) * 24 * 60, 2) running_minutes,
s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.status,
s.wait_class,
s.event,
s.seconds_in_wait,
s.blocking_session,
s.module,
s.action
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
LEFT JOIN gv$session s
ON s.audsid = fcr.oracle_session_id
WHERE fcr.phase_code = 'R'
ORDER BY running_minutes DESC;
22. Check SQL Monitor for Currently Running or Recently Completed SQL
SELECT
inst_id,
sql_id,
sql_exec_id,
sql_exec_start,
status,
username,
module,
action,
elapsed_time / 1000000 elapsed_sec,
cpu_time / 1000000 cpu_sec,
buffer_gets,
disk_reads,
px_servers_requested,
px_servers_allocated
FROM gv$sql_monitor
WHERE sql_id = '&sql_id'
ORDER BY sql_exec_start DESC;
Parallelism evidence
| Column | Meaning |
|---|---|
PX_SERVERS_REQUESTED | Requested parallel slaves |
PX_SERVERS_ALLOCATED | Actually allocated slaves |
| Requested > allocated | Parallel server shortage |
| Allocated = 0 | SQL did not run parallel |
23. Check Parallel Server Shortage
SELECT
name,
value
FROM v$sysstat
WHERE name LIKE 'Parallel operations%'
OR name LIKE 'Servers%';
SELECT
resource_name,
current_utilization,
max_utilization,
initial_allocation,
limit_value
FROM v$resource_limit
WHERE resource_name LIKE 'parallel%';
SELECT
name,
value
FROM v$parameter
WHERE name IN (
'parallel_max_servers',
'parallel_min_servers',
'parallel_degree_policy',
'parallel_servers_target',
'parallel_threads_per_cpu'
);
RCA wording
The SQL requested parallel execution, but SQL Monitor showed that requested PX servers were not fully allocated. This indicates parallel server shortage or system-level PX throttling during the month-end batch window.
24. If the SQL Was Sequential but Should Have Been Parallel
Check the plan:
SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST +PARALLEL'));
Look for PX operators:
PX COORDINATOR
PX SEND
PX RECEIVE
PX BLOCK ITERATOR
If not present, SQL did not run parallel.
Production-safe options:
- Do not kill directly.
- Inform business/application owner.
- Ask them to terminate and rerun if required.
-
Apply controlled tuning option:
- SQL Profile
- SQL Patch with parallel hint
- SPM baseline
- Program-level parameter change
- Stats correction
- Index correction
25. Check SQL Plan Baseline / SPM
SELECT
sql_handle,
plan_name,
enabled,
accepted,
fixed,
optimizer_cost,
created,
last_executed,
origin
FROM dba_sql_plan_baselines
WHERE signature IN (
SELECT exact_matching_signature
FROM gv$sql
WHERE sql_id = '&sql_id'
);
Check SQL profiles:
SELECT
name,
category,
status,
created,
last_modified,
force_matching,
sql_text
FROM dba_sql_profiles
WHERE sql_text LIKE '%partial text from SQL%';
26. Root Cause Decision Matrix
| Evidence Found | Likely Root Cause | Corrective Action |
|---|---|---|
| New plan hash value | Plan regression | SPM baseline / SQL profile / stats fix |
High db file scattered read | Full table scan / large scan | Check indexes, partition pruning, stats |
High direct path read temp | TEMP spill | PGA, SQL rewrite, hash/sort tuning |
| High CPU and buffer gets | Inefficient plan | Plan tuning, index, stats |
| High row lock waits | Blocking session | Scheduling conflict, commit frequency |
| High pending time | Concurrent manager backlog | Tune manager specialization/processes |
| PX requested but not allocated | Parallel server shortage | Review PX parameters and workload |
| Same SQL, same plan, more rows | Data volume increase | Purge/archive, partitioning, indexing |
| Stale stats | Bad cardinality | Gather targeted stats |
| RAC cluster waits | GC contention | Instance affinity, service placement |
27. Final RCA Format for Management
Subject
RCA for Delay in Concurrent Request <REQUEST_ID> - <PROGRAM_NAME>
Summary
The concurrent request <REQUEST_ID> for program <PROGRAM_NAME> was executed on <DATE> from <START_TIME> to <END_TIME>. The total runtime was <X> minutes, whereas the normal runtime is approximately <Y> minutes.
Key Findings
| Area | Finding |
|---|---|
| Request ID | <REQUEST_ID> |
| Program | <PROGRAM_NAME> |
| Normal Runtime | <Y> minutes |
| Actual Runtime | <X> minutes |
| Main SQL_ID | <SQL_ID> |
| Plan Hash Value | <PLAN_HASH_VALUE> |
| Top Wait Class | <WAIT_CLASS> |
| Top Wait Event | <EVENT> |
| Main Object | <OWNER.TABLE> |
| Impact | Request completed successfully but with delay |
Technical Evidence
ASH analysis shows that SQL_ID <SQL_ID> consumed the highest active session samples during the request runtime. The major wait event was <EVENT>, contributing approximately <XX>% of the active samples.
AWR SQL statistics show high elapsed time and <CPU/I/O/TEMP/LOCK> contribution during the same window. The SQL used plan hash value <PLAN_HASH_VALUE>. Historical comparison shows that previous faster executions used <OLD_PLAN_HASH_VALUE> / same plan with lower data volume.
Root Cause
The delay was primarily caused by <ROOT_CAUSE>.
Example:
The delay was primarily caused by SQL plan regression. During the delayed run, SQL_ID
abcd1234used plan hash value111111111, which performed a full table scan on a large transactional table. Previous faster runs used plan hash value222222222, which used an index access path. ASH and AWR evidence show high User I/O waits and increased disk reads during the request runtime.
Corrective Action
| Action | Owner | Status |
|---|---|---|
| Review SQL execution plan | DBA | Completed |
| Validate object statistics | DBA | In progress |
| Compare good vs bad plan | DBA | Completed |
| Apply SQL baseline/profile if required | DBA/App Team | Pending approval |
| Review scheduling during month-end | App/Business | Recommended |
Preventive Action
- Capture SQL baseline for stable plan if plan regression is confirmed.
- Monitor the program during month-end processing.
- Compare statistics before and after month-end load.
- Review concurrent manager load during peak processing.
- Enable trace for future execution if issue repeats.
- Create proactive alert for requests exceeding threshold runtime.
28. Short RCA Message for WhatsApp / Manager
Dear Team,
We reviewed the delayed concurrent request <REQUEST_ID> for program <PROGRAM_NAME>.
The request started at <START_TIME> and completed at <END_TIME>, with total runtime of <X> minutes against the usual runtime of <Y> minutes.
Initial ASH/AWR analysis shows that the delay was mainly due to SQL_ID <SQL_ID>, which consumed the highest active session samples during the request runtime. The major wait observed was <WAIT_EVENT>/<WAIT_CLASS>, indicating <I/O issue / CPU load / locking / TEMP usage / plan regression>.
The request completed successfully. DBA team is further validating the execution plan, object statistics, and historical plan comparison to confirm whether this was due to plan change, data volume increase, or runtime resource contention.
Regards,
Apps DBA Team
29. Best Production-Safe Approach
For production, follow this order:
- Do not kill immediately
- Identify request, SID, SQL_ID
- Check whether SQL is progressing
- Check wait event
- Check blocking
- Check SQL Monitor if available
- Inform business with evidence
- If rerun is required, ask business/application owner to terminate from front end
- Capture SQL evidence before termination
- Perform RCA using ASH/AWR after completion
30. Practical One-Line Conclusion
For a completed EBS concurrent request, the strongest RCA evidence comes from:
FND_CONCURRENT_REQUESTS
+ DBA_HIST_ACTIVE_SESS_HISTORY
+ DBA_HIST_SQLSTAT
+ DBA_HIST_SQLTEXT
+ DBMS_XPLAN.DISPLAY_AWR
+ SQL Monitor, if available
This combination helps you prove which SQL_ID delayed the request, what it was waiting for, whether the plan changed, and what action is needed.
No comments:
Post a Comment