Tuesday, May 12, 2026

Deep Dive RCA: Completed Long-Running Concurrent Request in Oracle EBS R12.2 / 19c

 

Deep Dive RCA: Completed Long-Running Concurrent Request in Oracle EBS R12.2 / 19c

1. RCA Objective

For a completed concurrent request, your goal is to prove:

  1. When the request started and completed
  2. Which database session handled it
  3. Which SQL_IDs were active during that runtime
  4. Which SQL consumed most time
  5. Why it consumed time
    • Plan change?
    • Full table scan?
    • High I/O?
    • CPU?
    • Locking?
    • TEMP spill?
    • Parallelism issue?
    • Data volume increase?
    • Stale statistics?
  6. What corrective or preventive action is recommended

2. Get Concurrent Request Basic Details

SELECT 
fcr.request_id,
fcpt.user_concurrent_program_name,
fcp.concurrent_program_name short_name,
fcr.request_date,
fcr.requested_start_date,
fcr.actual_start_date,
fcr.actual_completion_date,
ROUND((fcr.actual_completion_date - fcr.actual_start_date) * 24 * 60, 2) duration_minutes,
fcr.phase_code,
fcr.status_code,
fcr.completion_text,
fcr.oracle_process_id spid,
fcr.oracle_session_id audsid,
fcr.controlling_manager,
fcr.argument_text
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs fcp
ON fcp.concurrent_program_id = fcr.concurrent_program_id
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
WHERE fcr.request_id = &request_id;

What to capture

EvidenceWhy important
actual_start_dateStart of investigation window
actual_completion_dateEnd of investigation window
duration_minutesConfirms abnormal delay
oracle_session_idHelps map to database ASH session
oracle_process_idOS process ID
argument_textDifferent parameters may explain delay
completion_textAny warning/error message

3. Compare Against Normal Successful Runs

Before blaming SQL, compare the same program’s past runs.

SELECT 
fcr.request_id,
fcr.actual_start_date,
fcr.actual_completion_date,
ROUND((fcr.actual_completion_date - fcr.actual_start_date) * 24 * 60, 2) duration_minutes,
fcr.phase_code,
fcr.status_code,
fcr.argument_text
FROM apps.fnd_concurrent_requests fcr
WHERE fcr.concurrent_program_id = (
SELECT concurrent_program_id
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
)
AND fcr.actual_start_date IS NOT NULL
AND fcr.actual_completion_date IS NOT NULL
ORDER BY fcr.actual_start_date DESC
FETCH FIRST 20 ROWS ONLY;

RCA angle

If previous runs finished in 2 minutes and this run took 4 hours, then compare:

Compare ItemCheck
ArgumentsSame parameters or wider date range?
Runtime windowMonth-end? Peak batch time?
ManagerSame concurrent manager?
InstanceSame RAC node?
SQL planSame or changed plan hash?
Data volumeMore rows processed?

4. Find Database Session from Completed Request

For completed requests, the session is no longer in gv$session. So use historical ASH.

SELECT 
fcr.request_id,
fcr.oracle_session_id audsid,
fcr.oracle_process_id spid,
fcr.actual_start_date,
fcr.actual_completion_date
FROM apps.fnd_concurrent_requests fcr
WHERE fcr.request_id = &request_id;

Now search ASH using session_serial#, session_id, module/action, and time window.


5. Find SQL_IDs from ASH During Request Runtime

This is one of the most important RCA queries.

SELECT 
ash.sql_id,
ash.sql_plan_hash_value,
COUNT(*) ash_samples,
ROUND(COUNT(*) / 60, 2) approx_active_minutes,
MIN(ash.sample_time) first_seen,
MAX(ash.sample_time) last_seen,
ash.session_id,
ash.session_serial#,
ash.instance_number,
ash.module,
ash.action,
ash.program,
ash.wait_class,
ash.event
FROM dba_hist_active_sess_history ash
WHERE ash.sample_time BETWEEN
(SELECT actual_start_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND (SELECT actual_completion_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND ash.session_id IN (
SELECT oracle_session_id
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
)
GROUP BY
ash.sql_id,
ash.sql_plan_hash_value,
ash.session_id,
ash.session_serial#,
ash.instance_number,
ash.module,
ash.action,
ash.program,
ash.wait_class,
ash.event
ORDER BY ash_samples DESC;

How to read this

ResultMeaning
Highest ASH_SAMPLESSQL where most active time was spent
High User I/OI/O bottleneck, full scan, index access issue
High CPUExpensive execution plan or high row processing
High ConcurrencyBuffer busy, latch, library cache issue
High ApplicationLocking/enqueue issue
High ClusterRAC block transfer / GC waits
High Temp related waitsSort/hash spill issue

6. Get SQL Text for Top SQL_IDs

SELECT 
sql_id,
DBMS_LOB.SUBSTR(sql_text, 4000, 1) sql_text
FROM dba_hist_sqltext
WHERE sql_id = '&sql_id';

For current cursor cache, if still available:

SELECT 
inst_id,
sql_id,
child_number,
plan_hash_value,
parsing_schema_name,
module,
action,
executions,
elapsed_time / 1000000 elapsed_sec,
cpu_time / 1000000 cpu_sec,
buffer_gets,
disk_reads,
rows_processed,
sql_text
FROM gv$sql
WHERE sql_id = '&sql_id';

7. SQL-Level AWR Statistics

Use this to prove whether the SQL was expensive during the issue window.

SELECT 
sn.begin_interval_time,
sn.end_interval_time,
st.instance_number,
st.sql_id,
st.plan_hash_value,
st.executions_delta execs,
ROUND(st.elapsed_time_delta / 1000000, 2) elapsed_sec,
ROUND(st.cpu_time_delta / 1000000, 2) cpu_sec,
ROUND(st.iowait_delta / 1000000, 2) io_wait_sec,
ROUND(st.clwait_delta / 1000000, 2) cluster_wait_sec,
ROUND(st.apwait_delta / 1000000, 2) app_wait_sec,
st.buffer_gets_delta,
st.disk_reads_delta,
st.rows_processed_delta,
ROUND(st.elapsed_time_delta / 1000000 / NULLIF(st.executions_delta, 0), 2) avg_elapsed_sec
FROM dba_hist_sqlstat st
JOIN dba_hist_snapshot sn
ON sn.snap_id = st.snap_id
AND sn.dbid = st.dbid
AND sn.instance_number = st.instance_number
WHERE st.sql_id = '&sql_id'
AND sn.begin_interval_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
ORDER BY sn.begin_interval_time;

Interpretation

SymptomRCA direction
High elapsed + high CPUBad plan / high logical reads / inefficient joins
High elapsed + high I/O waitPhysical reads / full table scan / storage issue
High elapsed + high cluster waitRAC GC waits / block contention
High elapsed + high application waitLocking or enqueue
High buffer getsInefficient logical reads
High disk readsI/O heavy SQL
High rows processedData volume issue
Different plan hashPlan regression

8. Wait Class Breakdown for the Request

SELECT 
ash.wait_class,
ash.event,
COUNT(*) samples,
ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 2) pct
FROM dba_hist_active_sess_history ash
WHERE ash.sample_time BETWEEN
(SELECT actual_start_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND (SELECT actual_completion_date
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id)
AND ash.sql_id = '&sql_id'
GROUP BY ash.wait_class, ash.event
ORDER BY samples DESC;

Example RCA wording

EvidenceRCA statement
80% db file scattered readSQL spent most time doing multiblock reads, indicating full table scan or large segment scan
70% CPUSQL was CPU-bound due to high logical reads or inefficient execution plan
60% enq: TX - row lock contentionRequest was delayed due to row lock contention from another session
50% direct path read tempSQL spilled to TEMP due to large sort/hash operation

9. Find Blocking Sessions During the Runtime

SELECT 
sample_time,
instance_number,
session_id,
session_serial#,
blocking_session,
blocking_session_serial#,
blocking_inst_id,
event,
wait_class,
sql_id,
current_obj#
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND session_id = &sid
AND blocking_session IS NOT NULL
ORDER BY sample_time;

If blocking is found:

SELECT 
blocking_session,
blocking_session_serial#,
blocking_inst_id,
COUNT(*) blocking_samples,
MIN(sample_time) first_blocked,
MAX(sample_time) last_blocked
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND session_id = &sid
AND blocking_session IS NOT NULL
GROUP BY blocking_session, blocking_session_serial#, blocking_inst_id
ORDER BY blocking_samples DESC;

RCA wording

The concurrent request was not delayed due to SQL plan regression. ASH shows the session was waiting on row lock contention for approximately XX minutes, blocked by session SID XXX on instance X.


10. Check Plan History for SQL_ID

SELECT 
sql_id,
plan_hash_value,
COUNT(*) awr_snap_count,
MIN(snap_id) first_snap,
MAX(snap_id) last_snap
FROM dba_hist_sqlstat
WHERE sql_id = '&sql_id'
GROUP BY sql_id, plan_hash_value
ORDER BY awr_snap_count DESC;

If multiple plan hash values exist

That is a strong indicator of possible plan instability.

Now compare performance by plan:

SELECT 
st.sql_id,
st.plan_hash_value,
COUNT(*) snap_count,
SUM(st.executions_delta) total_execs,
ROUND(SUM(st.elapsed_time_delta) / 1000000, 2) total_elapsed_sec,
ROUND(SUM(st.cpu_time_delta) / 1000000, 2) total_cpu_sec,
SUM(st.buffer_gets_delta) total_buffer_gets,
SUM(st.disk_reads_delta) total_disk_reads,
ROUND(
SUM(st.elapsed_time_delta) / 1000000 / NULLIF(SUM(st.executions_delta), 0),
2
) avg_elapsed_per_exec_sec
FROM dba_hist_sqlstat st
WHERE st.sql_id = '&sql_id'
GROUP BY st.sql_id, st.plan_hash_value
ORDER BY avg_elapsed_per_exec_sec DESC;

RCA wording

The SQL used plan hash value 123456789 during the delayed run. Historical AWR shows the normal plan hash value was 987654321, with average elapsed time of 2 seconds per execution. The delayed plan averaged 240 seconds per execution, indicating SQL plan regression.


11. Display Execution Plan from AWR

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', NULL, NULL, 'ALLSTATS LAST +PEEKED_BINDS +OUTLINE +ALIAS'));

If the specific plan hash is known:

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', &plan_hash_value, NULL, 'ALL +OUTLINE +PEEKED_BINDS'));

Check inside the plan

Plan AreaWhat to check
Full table scanWas large table scanned?
Index accessWas correct index used?
Join methodNested loop vs hash join
CardinalityEstimated rows vs actual rows
TEMPSort/hash operation
ParallelismPX operators present or not
Partition pruningDid it prune partitions?
Predicate sectionWas filter pushed correctly?

12. Check If SQL Was Captured in AWR or Not

Sometimes the SQL_ID is not in AWR. Reasons:

  1. SQL ran for short bursts but many times
  2. SQL was not in top SQL during snapshot
  3. AWR retention/snapshot missed it
  4. SQL aged out
  5. Dynamic SQL changed SQL_ID
  6. Literal values caused multiple SQL_IDs
  7. ASH has samples but SQLSTAT does not

Check ASH first:

SELECT 
sql_id,
COUNT(*) ash_samples,
MIN(sample_time) first_seen,
MAX(sample_time) last_seen
FROM dba_hist_active_sess_history
WHERE sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
AND sql_id = '&sql_id'
GROUP BY sql_id;

Then check AWR SQLSTAT:

SELECT 
sql_id,
COUNT(*)
FROM dba_hist_sqlstat
WHERE sql_id = '&sql_id'
GROUP BY sql_id;

If SQL_ID is in ASH but not AWR

Use ASH evidence:

SELECT 
sql_id,
sql_plan_hash_value,
wait_class,
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY sql_id, sql_plan_hash_value, wait_class, event
ORDER BY samples DESC;

RCA wording:

The SQL_ID was not available in DBA_HIST_SQLSTAT, likely because it was not part of the top SQL captured in AWR SQL statistics. However, DBA_HIST_ACTIVE_SESS_HISTORY captured active samples for this SQL_ID during the request runtime, and the ASH evidence shows the SQL spent most active time on User I/O.


13. Check Object Behind Waits

From ASH, get object number:

SELECT 
ash.current_obj#,
obj.owner,
obj.object_name,
obj.object_type,
COUNT(*) samples
FROM dba_hist_active_sess_history ash
LEFT JOIN dba_objects obj
ON obj.object_id = ash.current_obj#
WHERE ash.sql_id = '&sql_id'
AND ash.sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY ash.current_obj#, obj.owner, obj.object_name, obj.object_type
ORDER BY samples DESC;

RCA wording

ASH shows that most I/O wait samples were associated with object XX.XX_TABLE, indicating the SQL spent significant time reading this table.


14. Check Segment Size

SELECT 
owner,
segment_name,
segment_type,
ROUND(bytes / 1024 / 1024 / 1024, 2) size_gb,
tablespace_name
FROM dba_segments
WHERE owner = '&owner'
AND segment_name = '&object_name';

If it is a large table, check partitioning:

SELECT 
table_owner,
table_name,
partition_name,
high_value,
num_rows,
blocks,
last_analyzed
FROM dba_tab_partitions
WHERE table_owner = '&owner'
AND table_name = '&table_name'
ORDER BY partition_position;

15. Check Table and Index Statistics

SELECT 
owner,
table_name,
num_rows,
blocks,
sample_size,
last_analyzed,
stale_stats
FROM dba_tab_statistics
WHERE owner = '&owner'
AND table_name = '&table_name';

Index stats:

SELECT 
owner,
index_name,
table_name,
blevel,
leaf_blocks,
distinct_keys,
clustering_factor,
num_rows,
last_analyzed,
stale_stats
FROM dba_ind_statistics
WHERE owner = '&owner'
AND table_name = '&table_name'
ORDER BY index_name;

RCA wording

Table statistics were last gathered on DD-MON-YYYY, but the data volume increased significantly during month-end. Stale or non-representative statistics may have contributed to incorrect optimizer cardinality and suboptimal plan selection.


16. Check if Full Table Scan Happened

From plan:

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('&sql_id', &plan_hash_value, NULL, 'ALL'));

Look for:

TABLE ACCESS FULL
INDEX FULL SCAN
HASH JOIN
SORT ORDER BY
TEMP TABLE TRANSFORMATION

Also from ASH:

SELECT 
event,
wait_class,
COUNT(*) samples
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND event IN (
'db file scattered read',
'direct path read',
'direct path read temp',
'db file sequential read'
)
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY event, wait_class
ORDER BY samples DESC;

Interpretation

WaitMeaning
db file scattered readMultiblock read, often full scan
direct path readLarge direct reads, often full scan
db file sequential readIndex lookup or single-block read
direct path read tempTEMP spill

17. Check TEMP Usage Evidence

For historical completed issue, use ASH event evidence:

SELECT 
sql_id,
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND event LIKE '%temp%'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY sql_id, event
ORDER BY samples DESC;

For currently running SQL:

SELECT 
s.inst_id,
s.sid,
s.serial#,
s.username,
s.sql_id,
u.tablespace,
ROUND(u.blocks * ts.block_size / 1024 / 1024, 2) temp_mb
FROM gv$sort_usage u
JOIN gv$session s
ON s.saddr = u.session_addr
AND s.inst_id = u.inst_id
JOIN dba_tablespaces ts
ON ts.tablespace_name = u.tablespace
ORDER BY temp_mb DESC;

18. Check RAC GC Waits

SELECT 
event,
COUNT(*) samples,
ROUND(COUNT(*) / 60, 2) approx_minutes
FROM dba_hist_active_sess_history
WHERE sql_id = '&sql_id'
AND wait_class = 'Cluster'
AND sample_time BETWEEN
TO_DATE('&start_time','DD-MON-YYYY HH24:MI:SS')
AND TO_DATE('&end_time','DD-MON-YYYY HH24:MI:SS')
GROUP BY event
ORDER BY samples DESC;

RCA wording

The SQL experienced significant RAC cluster waits, mainly gc buffer busy acquire, indicating global cache contention between RAC instances.


19. Check Concurrent Manager Load

Sometimes the delay is not SQL. It may be waiting before execution due to manager backlog.

SELECT 
request_id,
requested_start_date,
actual_start_date,
ROUND((actual_start_date - requested_start_date) * 24 * 60, 2) pending_minutes,
actual_completion_date,
ROUND((actual_completion_date - actual_start_date) * 24 * 60, 2) running_minutes,
phase_code,
status_code
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id;

Interpret

CaseMeaning
High pending timeConcurrent manager backlog
High running timeDB/program execution issue
Both highManager backlog + SQL/runtime issue

Check manager:

SELECT 
concurrent_queue_name,
target_node,
max_processes,
running_processes,
control_code
FROM apps.fnd_concurrent_queues
WHERE concurrent_queue_id = (
SELECT controlling_manager
FROM apps.fnd_concurrent_requests
WHERE request_id = &request_id
);

20. Map Currently Running SID to Concurrent Request

For future live babysitting:

SELECT 
fcr.request_id,
fcpt.user_concurrent_program_name,
fcr.phase_code,
fcr.status_code,
fcr.actual_start_date,
s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.sql_child_number,
s.event,
s.wait_class,
s.seconds_in_wait,
s.module,
s.action,
p.spid
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
JOIN gv$session s
ON s.audsid = fcr.oracle_session_id
JOIN gv$process p
ON p.addr = s.paddr
AND p.inst_id = s.inst_id
WHERE fcr.phase_code = 'R'
AND fcr.request_id = &request_id;

21. Live Monitoring Query for Long-Running Concurrent Requests

SELECT 
fcr.request_id,
fcpt.user_concurrent_program_name,
ROUND((SYSDATE - fcr.actual_start_date) * 24 * 60, 2) running_minutes,
s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.status,
s.wait_class,
s.event,
s.seconds_in_wait,
s.blocking_session,
s.module,
s.action
FROM apps.fnd_concurrent_requests fcr
JOIN apps.fnd_concurrent_programs_tl fcpt
ON fcpt.concurrent_program_id = fcr.concurrent_program_id
AND fcpt.language = USERENV('LANG')
LEFT JOIN gv$session s
ON s.audsid = fcr.oracle_session_id
WHERE fcr.phase_code = 'R'
ORDER BY running_minutes DESC;

22. Check SQL Monitor for Currently Running or Recently Completed SQL

SELECT 
inst_id,
sql_id,
sql_exec_id,
sql_exec_start,
status,
username,
module,
action,
elapsed_time / 1000000 elapsed_sec,
cpu_time / 1000000 cpu_sec,
buffer_gets,
disk_reads,
px_servers_requested,
px_servers_allocated
FROM gv$sql_monitor
WHERE sql_id = '&sql_id'
ORDER BY sql_exec_start DESC;

Parallelism evidence

ColumnMeaning
PX_SERVERS_REQUESTEDRequested parallel slaves
PX_SERVERS_ALLOCATEDActually allocated slaves
Requested > allocatedParallel server shortage
Allocated = 0SQL did not run parallel

23. Check Parallel Server Shortage

SELECT 
name,
value
FROM v$sysstat
WHERE name LIKE 'Parallel operations%'
OR name LIKE 'Servers%';
SELECT 
resource_name,
current_utilization,
max_utilization,
initial_allocation,
limit_value
FROM v$resource_limit
WHERE resource_name LIKE 'parallel%';
SELECT 
name,
value
FROM v$parameter
WHERE name IN (
'parallel_max_servers',
'parallel_min_servers',
'parallel_degree_policy',
'parallel_servers_target',
'parallel_threads_per_cpu'
);

RCA wording

The SQL requested parallel execution, but SQL Monitor showed that requested PX servers were not fully allocated. This indicates parallel server shortage or system-level PX throttling during the month-end batch window.


24. If the SQL Was Sequential but Should Have Been Parallel

Check the plan:

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('&sql_id', NULL, 'ALLSTATS LAST +PARALLEL'));

Look for PX operators:

PX COORDINATOR
PX SEND
PX RECEIVE
PX BLOCK ITERATOR

If not present, SQL did not run parallel.

Production-safe options:

  1. Do not kill directly.
  2. Inform business/application owner.
  3. Ask them to terminate and rerun if required.
  4. Apply controlled tuning option:
    • SQL Profile
    • SQL Patch with parallel hint
    • SPM baseline
    • Program-level parameter change
    • Stats correction
    • Index correction

25. Check SQL Plan Baseline / SPM

SELECT 
sql_handle,
plan_name,
enabled,
accepted,
fixed,
optimizer_cost,
created,
last_executed,
origin
FROM dba_sql_plan_baselines
WHERE signature IN (
SELECT exact_matching_signature
FROM gv$sql
WHERE sql_id = '&sql_id'
);

Check SQL profiles:

SELECT 
name,
category,
status,
created,
last_modified,
force_matching,
sql_text
FROM dba_sql_profiles
WHERE sql_text LIKE '%partial text from SQL%';

26. Root Cause Decision Matrix

Evidence FoundLikely Root CauseCorrective Action
New plan hash valuePlan regressionSPM baseline / SQL profile / stats fix
High db file scattered readFull table scan / large scanCheck indexes, partition pruning, stats
High direct path read tempTEMP spillPGA, SQL rewrite, hash/sort tuning
High CPU and buffer getsInefficient planPlan tuning, index, stats
High row lock waitsBlocking sessionScheduling conflict, commit frequency
High pending timeConcurrent manager backlogTune manager specialization/processes
PX requested but not allocatedParallel server shortageReview PX parameters and workload
Same SQL, same plan, more rowsData volume increasePurge/archive, partitioning, indexing
Stale statsBad cardinalityGather targeted stats
RAC cluster waitsGC contentionInstance affinity, service placement

27. Final RCA Format for Management

Subject

RCA for Delay in Concurrent Request <REQUEST_ID> - <PROGRAM_NAME>

Summary

The concurrent request <REQUEST_ID> for program <PROGRAM_NAME> was executed on <DATE> from <START_TIME> to <END_TIME>. The total runtime was <X> minutes, whereas the normal runtime is approximately <Y> minutes.

Key Findings

AreaFinding
Request ID<REQUEST_ID>
Program<PROGRAM_NAME>
Normal Runtime<Y> minutes
Actual Runtime<X> minutes
Main SQL_ID<SQL_ID>
Plan Hash Value<PLAN_HASH_VALUE>
Top Wait Class<WAIT_CLASS>
Top Wait Event<EVENT>
Main Object<OWNER.TABLE>
ImpactRequest completed successfully but with delay

Technical Evidence

ASH analysis shows that SQL_ID <SQL_ID> consumed the highest active session samples during the request runtime. The major wait event was <EVENT>, contributing approximately <XX>% of the active samples.

AWR SQL statistics show high elapsed time and <CPU/I/O/TEMP/LOCK> contribution during the same window. The SQL used plan hash value <PLAN_HASH_VALUE>. Historical comparison shows that previous faster executions used <OLD_PLAN_HASH_VALUE> / same plan with lower data volume.

Root Cause

The delay was primarily caused by <ROOT_CAUSE>.

Example:

The delay was primarily caused by SQL plan regression. During the delayed run, SQL_ID abcd1234 used plan hash value 111111111, which performed a full table scan on a large transactional table. Previous faster runs used plan hash value 222222222, which used an index access path. ASH and AWR evidence show high User I/O waits and increased disk reads during the request runtime.

Corrective Action

ActionOwnerStatus
Review SQL execution planDBACompleted
Validate object statisticsDBAIn progress
Compare good vs bad planDBACompleted
Apply SQL baseline/profile if requiredDBA/App TeamPending approval
Review scheduling during month-endApp/BusinessRecommended

Preventive Action

  1. Capture SQL baseline for stable plan if plan regression is confirmed.
  2. Monitor the program during month-end processing.
  3. Compare statistics before and after month-end load.
  4. Review concurrent manager load during peak processing.
  5. Enable trace for future execution if issue repeats.
  6. Create proactive alert for requests exceeding threshold runtime.

28. Short RCA Message for WhatsApp / Manager

Dear Team,

We reviewed the delayed concurrent request <REQUEST_ID> for program <PROGRAM_NAME>.

The request started at <START_TIME> and completed at <END_TIME>, with total runtime of <X> minutes against the usual runtime of <Y> minutes.

Initial ASH/AWR analysis shows that the delay was mainly due to SQL_ID <SQL_ID>, which consumed the highest active session samples during the request runtime. The major wait observed was <WAIT_EVENT>/<WAIT_CLASS>, indicating <I/O issue / CPU load / locking / TEMP usage / plan regression>.

The request completed successfully. DBA team is further validating the execution plan, object statistics, and historical plan comparison to confirm whether this was due to plan change, data volume increase, or runtime resource contention.

Regards,
Apps DBA Team

29. Best Production-Safe Approach

For production, follow this order:

  1. Do not kill immediately
  2. Identify request, SID, SQL_ID
  3. Check whether SQL is progressing
  4. Check wait event
  5. Check blocking
  6. Check SQL Monitor if available
  7. Inform business with evidence
  8. If rerun is required, ask business/application owner to terminate from front end
  9. Capture SQL evidence before termination
  10. Perform RCA using ASH/AWR after completion

30. Practical One-Line Conclusion

For a completed EBS concurrent request, the strongest RCA evidence comes from:

FND_CONCURRENT_REQUESTS 
+ DBA_HIST_ACTIVE_SESS_HISTORY
+ DBA_HIST_SQLSTAT
+ DBA_HIST_SQLTEXT
+ DBMS_XPLAN.DISPLAY_AWR
+ SQL Monitor, if available

This combination helps you prove which SQL_ID delayed the request, what it was waiting for, whether the plan changed, and what action is needed.

No comments:

Post a Comment