Tuesday, May 12, 2026

APPS DBA = SQL Plan Regression Runbook


SQL_ID: <SQL_ID> Incident: Month-End Performance Degradation Bad Plan Hash: <BAD_PLAN_HASH> Good Plan Hash (if known): <GOOD_PLAN_HASH>


1. Purpose

This runbook provides a standardized, production-safe procedure to diagnose and resolve SQL plan regression for the identified SQL statement during month-end processing.

2. Scope

Applies when this SQL_ID shows significantly higher elapsed time due to plan change from good to bad plan hash value.

3. Prerequisites

  • Access to the database as a privileged user (with SELECT on AWR views, DBA_* views).
  • AWR snapshots available for the affected period.
  • SQL Tuning Advisor and SQL Plan Management licenses (Enterprise Edition).

4. Step-by-Step Diagnosis

Step 4.1: Confirm Multiple Plans Exist

SQL
SELECT DISTINCT plan_hash_value,
       MIN(begin_interval_time) first_seen,
       MAX(begin_interval_time) last_seen,
       COUNT(*) snaps
FROM dba_hist_sqlstat
WHERE sql_id = '<SQL_ID>'
GROUP BY plan_hash_value
ORDER BY first_seen;

Step 4.2: Compare Performance by Plan

SQL
SELECT 
  plan_hash_value,
  SUM(executions_delta) executions,
  ROUND(SUM(elapsed_time_delta)/1e6 / NULLIF(SUM(executions_delta),0), 2) "AVG_ELA_SEC",
  ROUND(SUM(cpu_time_delta)/1e6 / NULLIF(SUM(executions_delta),0), 2) "AVG_CPU_SEC",
  ROUND(SUM(buffer_gets_delta) / NULLIF(SUM(executions_delta),0), 0) "AVG_BUF_GETS",
  ROUND(SUM(disk_reads_delta) / NULLIF(SUM(executions_delta),0), 0) "AVG_DISK_READS"
FROM dba_hist_sqlstat s
JOIN dba_hist_snapshot sn USING (snap_id, instance_number)
WHERE sql_id = '<SQL_ID>'
  AND executions_delta > 0
GROUP BY plan_hash_value
ORDER BY "AVG_ELA_SEC" DESC;

Decision: If <BAD_PLAN_HASH> has significantly higher AVG_ELA_SEC (typically ≥ 3x), proceed.

Step 4.3: Compare Execution Plans

SQL
-- Current / Bad Plan
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<SQL_ID>', 0, 'ADVANCED ALLSTATS LAST'));

-- Historical Good Plan (AWR)
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('<SQL_ID>', <GOOD_PLAN_HASH>, NULL, 'ADVANCED'));

Look for differences in: Full Table Scan vs Index, Join order/method, Cardinality estimates.

Step 4.4: Check Stale Statistics

SQL
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

SELECT owner, table_name, partition_name, last_analyzed, stale_stats 
FROM dba_tab_statistics 
WHERE stale_stats = 'YES' OR last_analyzed IS NULL;

Step 4.5: Check Bind Sensitivity & Child Cursors

SQL
SELECT child_number, plan_hash_value, is_bind_sensitive, is_bind_aware, executions, 
       elapsed_time/1e6 avg_ela_sec
FROM v$sql 
WHERE sql_id = '<SQL_ID>';

SELECT * FROM v$sql_shared_cursor WHERE sql_id = '<SQL_ID>';

5. Corrective Actions (Production-Safe Order)

PriorityActionCommand / StepsRisk Level
1 (Recommended)SQL Plan Baseline (fixes plan stability)```sql
2SQL Profile (if no good plan in cache)Run SQL Tuning Advisor → Accept ProfileLow-Medium
3Gather fresh statistics (maintenance window)EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'OWNER', tabname=>'TABLE', degree=>8);Medium

Do NOT use OPTIMIZER_GATHER_STATS hints or drop plans in production without testing.


6. Verification Steps

After applying fix:

  1. Re-run performance comparison query (Step 4.2).
  2. Confirm new executions are using <GOOD_PLAN_HASH>.
  3. Monitor real-time:
    SQL
    SELECT sql_id, plan_hash_value, executions, elapsed_time/1e6 
    FROM v$sql WHERE sql_id = '<SQL_ID>';
  4. Check AWR report for next snapshot.

Success Criteria: Average elapsed time returns to normal/good plan levels.


7. Rollback / Backout Plan

  • Drop Baseline:
    SQL
    EXEC DBMS_SPM.DROP_SQL_PLAN_BASELINE(plan_name => '<PLAN_NAME>');
  • Drop SQL Profile:
    SQL
    EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE(profile_name => '<PROFILE_NAME>');

8. Root Cause Analysis (RCA) Template for Management

Root Cause: The SQL with ID <SQL_ID> experienced a plan regression during month-end from Plan Hash <GOOD_PLAN_HASH> (efficient) to <BAD_PLAN_HASH> (sub-optimal). This was triggered by [stale statistics / bind peeking issue / data volume skew / cardinality misestimation].

Business Impact: Elapsed time increased by ~X times, causing month-end batch delay of XX minutes/hours.

Corrective Action: Implemented SQL Plan Baseline (or Profile) to enforce the good plan. Statistics gathering scheduled.

Prevention Measures:

  • Enable SQL Plan Management for critical SQLs.
  • Review statistics gathering strategy for month-end tables.
  • Monitor plan changes via AWR / custom alerts.

9. Escalation Path

  • Level 1: DBA performing analysis
  • Level 2: Senior DBA / Performance Architect
  • Level 3: Application Owner + Architecture Team

10. References

  • Oracle Documentation: DBMS_SPM, DBMS_SQLTUNE, DBMS_XPLAN
  • AWR Views: DBA_HIST_SQLSTAT, DBA_HIST_SNAPSHOT

Deep Dive Analysis of Problematic Long-Running SQL_IDs During Month-End

 

Oracle EBS Apps DBA Real-Time Production Troubleshooting Guide

Month-end processing is always a critical period for any Oracle E-Business Suite production environment. During this time, multiple high-volume concurrent programs run together, business users are under pressure, and even one bad SQL plan can delay important financial or operational closure activities.

As an Apps DBA, our responsibility is not only to say that “SQL was running long,” but to provide a proper technical deep dive:

Which SQL_ID caused the delay?
Which concurrent request triggered it?
Which user submitted the request?
Was it a bad execution plan, stale statistics, blocking, TEMP usage, or lack of parallelism?
What is the permanent fix?

This blog post explains a practical production approach to analyze all problematic long-running SQL_IDs during month-end.


1. Objective of Month-End SQL Deep Dive

The main objective is to identify SQL statements that caused abnormal runtime during month-end and prepare a clear RCA with corrective and preventive actions.

During the analysis, we should answer the following questions:

QuestionEvidence Required
Which SQL_IDs ran long?AWR, ASH, GV$SQL
Which concurrent request triggered the SQL?FND concurrent request tables
Which program and user were involved?Request ID, program name, user name
What was the main wait event?ASH / GV$SESSION
Did the execution plan change?Current plan vs historical AWR plan
Was SQL running parallel or serial?GV$PX_SESSION
Was there TEMP spill?GV$TEMPSEG_USAGE
Was there any blocking?GV$SESSION blocking columns
What is the root cause?Plan regression, stale stats, blocking, TEMP, I/O, DOP issue
What is the preventive action?Stats, SPM, scheduling, DOP governance, monitoring

2. Define the Month-End Issue Window

Before starting any technical analysis, first define the exact time window.

Example:

-- Month-end issue window
-- Start Time : 30-Apr-2026 18:00
-- End Time : 01-May-2026 06:00

Now identify the AWR snapshot IDs for this period:

SELECT snap_id,
instance_number,
begin_interval_time,
end_interval_time
FROM dba_hist_snapshot
WHERE begin_interval_time BETWEEN TO_DATE('30-APR-2026 18:00','DD-MON-YYYY HH24:MI')
AND TO_DATE('01-MAY-2026 06:00','DD-MON-YYYY HH24:MI')
ORDER BY snap_id, instance_number;

These snapshot IDs will be used for historical SQL analysis.


3. Identify Top Problematic SQL_IDs During Month-End

This query helps identify the top SQL statements by elapsed time, CPU time, I/O wait, buffer gets, disk reads, and executions.

SELECT *
FROM (
SELECT ss.sql_id,
ss.plan_hash_value,
SUM(ss.executions_delta) executions,
ROUND(SUM(ss.elapsed_time_delta)/1000000/60,2) elapsed_mins,
ROUND(SUM(ss.cpu_time_delta)/1000000/60,2) cpu_mins,
ROUND(SUM(ss.iowait_delta)/1000000/60,2) io_wait_mins,
ROUND(SUM(ss.clwait_delta)/1000000/60,2) cluster_wait_mins,
ROUND(SUM(ss.apwait_delta)/1000000/60,2) app_wait_mins,
SUM(ss.buffer_gets_delta) buffer_gets,
SUM(ss.disk_reads_delta) disk_reads,
SUM(ss.rows_processed_delta) rows_processed,
ROUND(
SUM(ss.elapsed_time_delta)/1000000 /
NULLIF(SUM(ss.executions_delta),0),2
) avg_elapsed_sec_per_exec
FROM dba_hist_sqlstat ss
JOIN dba_hist_snapshot sn
ON ss.snap_id = sn.snap_id
AND ss.instance_number = sn.instance_number
WHERE sn.begin_interval_time BETWEEN TO_DATE('30-APR-2026 18:00','DD-MON-YYYY HH24:MI')
AND TO_DATE('01-MAY-2026 06:00','DD-MON-YYYY HH24:MI')
GROUP BY ss.sql_id, ss.plan_hash_value
ORDER BY SUM(ss.elapsed_time_delta) DESC
)
WHERE ROWNUM <= 30;

How to Read the Output

ObservationMeaning
High elapsed time and high CPUSQL is CPU intensive
High elapsed time and high I/O waitPossible full table scan or storage issue
High elapsed time and TEMP waitSort/hash operation spilled to TEMP
High elapsed time but low CPUPossible blocking or concurrency issue
High executions with small average timeSQL is called repeatedly
Low executions with very high average timeOne or few executions were very expensive
Multiple plan hash valuesPossible plan instability or regression

4. Get SQL Text for the SQL_ID

Once the problematic SQL_ID is identified, get the SQL text.

SELECT sql_id,
DBMS_LOB.SUBSTR(sql_text, 4000, 1) sql_text
FROM dba_hist_sqltext
WHERE sql_id = '&SQL_ID';

For current SQL available in memory:

SELECT sql_id,
child_number,
plan_hash_value,
executions,
elapsed_time/1000000 elapsed_sec,
cpu_time/1000000 cpu_sec,
buffer_gets,
disk_reads,
rows_processed,
parsing_schema_name,
module,
action
FROM gv$sql
WHERE sql_id = '&SQL_ID';

5. Map SQL_ID to Current Database Session

If the SQL is currently running, use the below query to identify the session details.

SELECT s.inst_id,
s.sid,
s.serial#,
s.username,
s.module,
s.action,
s.program,
s.status,
s.sql_id,
s.prev_sql_id,
s.event,
s.wait_class,
s.seconds_in_wait,
s.blocking_session,
s.machine,
s.osuser,
p.spid os_pid
FROM gv$session s
JOIN gv$process p
ON s.inst_id = p.inst_id
AND s.paddr = p.addr
WHERE s.sql_id = '&SQL_ID'
ORDER BY s.inst_id, s.sid;

This gives the database SID, serial number, OS process ID, wait event, and blocking session details.


6. Map SQL_ID to EBS Concurrent Request

This is one of the most important steps in Oracle EBS troubleshooting.

As Apps DBA, we should not stop only at SQL_ID. We must map it to the actual concurrent request and program.

SELECT s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.event,
s.wait_class,
s.seconds_in_wait,
r.request_id,
r.phase_code,
r.status_code,
r.actual_start_date,
r.actual_completion_date,
ROUND((NVL(r.actual_completion_date,SYSDATE)-r.actual_start_date)*24*60,2) runtime_mins,
cp.concurrent_program_name,
cpt.user_concurrent_program_name,
u.user_name,
r.argument_text
FROM gv$session s
JOIN gv$process p
ON s.inst_id = p.inst_id
AND s.paddr = p.addr
JOIN apps.fnd_concurrent_requests r
ON r.oracle_process_id = p.spid
JOIN apps.fnd_concurrent_programs cp
ON r.concurrent_program_id = cp.concurrent_program_id
AND r.program_application_id = cp.application_id
JOIN apps.fnd_concurrent_programs_tl cpt
ON cp.concurrent_program_id = cpt.concurrent_program_id
AND cp.application_id = cpt.application_id
AND cpt.language = 'US'
JOIN apps.fnd_user u
ON r.requested_by = u.user_id
WHERE s.sql_id = '&SQL_ID';

This gives the full production mapping:

SQL_ID 
→ SID / SERIAL#
→ OS Process ID
→ Concurrent Request ID
→ Concurrent Program Name
→ Submitted User
→ Request Parameters

This mapping is very useful for RCA and business communication.


7. Find Long-Running Concurrent Requests During Month-End

Use this query to list all long-running concurrent requests during the month-end window.

SELECT r.request_id,
cpt.user_concurrent_program_name,
cp.concurrent_program_name short_name,
u.user_name requested_by,
r.phase_code,
r.status_code,
r.actual_start_date,
r.actual_completion_date,
ROUND((NVL(r.actual_completion_date,SYSDATE)-r.actual_start_date)*24*60,2) runtime_mins,
r.argument_text
FROM apps.fnd_concurrent_requests r
JOIN apps.fnd_concurrent_programs cp
ON r.concurrent_program_id = cp.concurrent_program_id
AND r.program_application_id = cp.application_id
JOIN apps.fnd_concurrent_programs_tl cpt
ON cp.concurrent_program_id = cpt.concurrent_program_id
AND cp.application_id = cpt.application_id
AND cpt.language = 'US'
JOIN apps.fnd_user u
ON r.requested_by = u.user_id
WHERE r.actual_start_date BETWEEN TO_DATE('30-APR-2026 18:00','DD-MON-YYYY HH24:MI')
AND TO_DATE('01-MAY-2026 06:00','DD-MON-YYYY HH24:MI')
AND ROUND((NVL(r.actual_completion_date,SYSDATE)-r.actual_start_date)*24*60,2) > 30
ORDER BY runtime_mins DESC;

This query helps identify requests that exceeded the expected runtime.


8. Analyze ASH Wait Events for the SQL_ID

ASH analysis tells us where the SQL spent most of its time.

SELECT ash.sql_id,
ash.sql_plan_hash_value,
ash.session_state,
ash.wait_class,
ash.event,
COUNT(*) ash_samples
FROM dba_hist_active_sess_history ash
JOIN dba_hist_snapshot sn
ON ash.snap_id = sn.snap_id
AND ash.instance_number = sn.instance_number
WHERE ash.sql_id = '&SQL_ID'
AND sn.begin_interval_time BETWEEN TO_DATE('30-APR-2026 18:00','DD-MON-YYYY HH24:MI')
AND TO_DATE('01-MAY-2026 06:00','DD-MON-YYYY HH24:MI')
GROUP BY ash.sql_id,
ash.sql_plan_hash_value,
ash.session_state,
ash.wait_class,
ash.event
ORDER BY ash_samples DESC;

Common Wait Events and Meaning

Wait Event / Wait ClassPossible Meaning
CPUSQL is CPU intensive
db file scattered readFull table scan / multiblock read
db file sequential readIndex access / single block read
direct path read tempTEMP read due to sort/hash spill
direct path write tempTEMP write due to sort/hash spill
enq: TX - row lock contentionBlocking due to row lock
library cache lockObject dependency / hard parse issue
PX Deq Credit: send blkdParallel query bottleneck
resmgr:cpu quantumResource Manager throttling
gc buffer busy acquireRAC block contention

9. Check Whether SQL Ran Parallel or Serial

During month-end, many long-running SQLs are expected to run with parallelism. However, due to parallel server shortage or parameter limits, the SQL may run serially or with reduced DOP.

SELECT s.inst_id,
s.sid,
s.serial#,
s.sql_id,
s.qcsid,
s.qcserial#,
s.server_group,
s.server_set,
s.degree,
s.req_degree
FROM gv$px_session s
WHERE s.sql_id = '&SQL_ID'
ORDER BY s.inst_id, s.qcsid, s.sid;

Check parallel server availability:

SELECT *
FROM gv$px_process_sysstat
WHERE statistic LIKE 'Servers%';

Check parallel-related parameters:

SHOW PARAMETER parallel_max_servers;
SHOW PARAMETER parallel_servers_target;
SHOW PARAMETER parallel_degree_policy;
SHOW PARAMETER parallel_min_time_threshold;

Interpretation

FindingMeaning
No rows in GV$PX_SESSIONSQL is not running parallel
REQ_DEGREE greater than DEGREERequested DOP was not fully granted
Parallel servers exhaustedOther jobs consumed PX servers
QC waiting for PX slavesParallel execution bottleneck
High PX waitsParallel skew or resource pressure

10. Display Current Execution Plan

If the SQL is still available in cursor cache, use:

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(
sql_id => '&SQL_ID',
cursor_child_no => NULL,
format => 'ALLSTATS LAST +PEEKED_BINDS +OUTLINE +ALIAS +NOTE'
));

Look for the following:

Plan ObservationPossible Issue
Full table scan on large tableMissing index, stale stats, or expected full scan
Nested loop with huge rowsBad join method
Hash join with TEMP spillPGA/TEMP issue
Actual rows much higher than estimated rowsCardinality/statistics issue
Different plan hash valuePlan regression
No PX operationsSQL did not run parallel
PX operations but slowParallel skew or resource bottleneck

11. Display Historical Execution Plan from AWR

To check historical plans:

SELECT *
FROM TABLE(DBMS_XPLAN.DISPLAY_AWR(
sql_id => '&SQL_ID',
plan_hash_value => NULL,
db_id => NULL,
format => 'ALL +OUTLINE +NOTE'
));

This is useful when the SQL is no longer available in memory but exists in AWR.


12. Check Plan Regression

Use the below query to check whether the same SQL_ID used multiple execution plans.

SELECT sql_id,
plan_hash_value,
COUNT(DISTINCT snap_id) snap_count,
SUM(executions_delta) executions,
ROUND(SUM(elapsed_time_delta)/1000000/60,2) elapsed_mins,
ROUND(SUM(elapsed_time_delta)/1000000 / NULLIF(SUM(executions_delta),0),2) avg_elapsed_sec
FROM dba_hist_sqlstat
WHERE sql_id = '&SQL_ID'
GROUP BY sql_id, plan_hash_value
ORDER BY avg_elapsed_sec DESC;

RCA Interpretation

FindingRCA Direction
Multiple plan hash valuesSQL plan instability
Old plan fast, new plan slowPlan regression
Same plan but higher data volumeMonth-end volume issue
Same plan but different waitsSystem load, I/O, TEMP, or blocking issue

13. Check Object Statistics

First identify the objects used by the SQL.

SELECT DISTINCT object_owner,
object_name,
object_type
FROM dba_hist_sql_plan
WHERE sql_id = '&SQL_ID'
AND object_owner IS NOT NULL
ORDER BY object_owner, object_name;

Then check table statistics:

SELECT owner,
table_name,
num_rows,
blocks,
last_analyzed,
stale_stats,
stattype_locked
FROM dba_tab_statistics
WHERE owner = '&OWNER'
AND table_name = '&TABLE_NAME';

Check index statistics:

SELECT owner,
index_name,
table_name,
blevel,
leaf_blocks,
clustering_factor,
num_rows,
last_analyzed,
stale_stats
FROM dba_ind_statistics
WHERE owner = '&OWNER'
AND table_name = '&TABLE_NAME';

Interpretation

FindingMeaning
LAST_ANALYZED is oldStats may be outdated
STALE_STATS = YESStats refresh may be required
STATS lockedOptimizer may use old stats
Bad clustering factorIndex access may be costly
Actual rows very different from estimated rowsCardinality issue

14. Check TEMP Usage

TEMP usage is common during month-end due to large sorting, hashing, reporting, and accounting jobs.

SELECT s.inst_id,
s.sid,
s.serial#,
s.sql_id,
u.tablespace,
ROUND(u.blocks * t.block_size / 1024 / 1024,2) temp_mb
FROM gv$tempseg_usage u
JOIN gv$session s
ON u.inst_id = s.inst_id
AND u.session_addr = s.saddr
JOIN dba_tablespaces t
ON u.tablespace = t.tablespace_name
WHERE s.sql_id = '&SQL_ID'
ORDER BY temp_mb DESC;

If TEMP usage is high, check the execution plan for hash joins, sorts, group by, order by, or parallel operations.


15. Check Blocking Sessions

Blocking can make a request appear like a SQL performance issue, but the actual cause may be another session holding locks.

SELECT s.inst_id,
s.sid,
s.serial#,
s.username,
s.sql_id,
s.event,
s.wait_class,
s.blocking_instance,
s.blocking_session,
s.seconds_in_wait
FROM gv$session s
WHERE s.blocking_session IS NOT NULL
ORDER BY s.seconds_in_wait DESC;

To check the blocker:

SELECT s.inst_id,
s.sid,
s.serial#,
s.username,
s.sql_id,
s.prev_sql_id,
s.module,
s.action,
s.program,
s.machine,
s.status
FROM gv$session s
WHERE s.sid = '&BLOCKING_SESSION';

Production advice:

Do not directly kill production sessions unless business impact is confirmed and approval is taken.

In EBS, it is better to first identify the request/user and ask the business team to terminate and rerun if that is safer.


16. Check Bind Sensitivity and Child Cursors

Some SQLs perform differently because of bind values, adaptive cursor sharing, or multiple child cursors.

SELECT sql_id,
child_number,
plan_hash_value,
is_bind_sensitive,
is_bind_aware,
is_shareable,
executions,
parsing_schema_name
FROM gv$sql
WHERE sql_id = '&SQL_ID';

To check child cursor reasons:

SELECT sql_id,
child_number,
reason
FROM gv$sql_shared_cursor
WHERE sql_id = '&SQL_ID';

Possible RCA wording:

SQL used multiple child cursors due to bind sensitivity or optimizer environment mismatch. This may have contributed to plan instability during month-end.


17. Compare Normal Runtime vs Month-End Runtime

This query helps compare normal request runtime against abnormal month-end runtime.

SELECT TRUNC(r.actual_start_date) run_date,
cpt.user_concurrent_program_name,
COUNT(*) total_runs,
ROUND(MIN((r.actual_completion_date-r.actual_start_date)*24*60),2) min_mins,
ROUND(AVG((r.actual_completion_date-r.actual_start_date)*24*60),2) avg_mins,
ROUND(MAX((r.actual_completion_date-r.actual_start_date)*24*60),2) max_mins
FROM apps.fnd_concurrent_requests r
JOIN apps.fnd_concurrent_programs cp
ON r.concurrent_program_id = cp.concurrent_program_id
AND r.program_application_id = cp.application_id
JOIN apps.fnd_concurrent_programs_tl cpt
ON cp.concurrent_program_id = cpt.concurrent_program_id
AND cp.application_id = cpt.application_id
AND cpt.language = 'US'
WHERE cpt.user_concurrent_program_name LIKE '%&PROGRAM_NAME%'
AND r.actual_start_date >= SYSDATE - 30
AND r.actual_completion_date IS NOT NULL
GROUP BY TRUNC(r.actual_start_date),
cpt.user_concurrent_program_name
ORDER BY run_date DESC;

Example RCA line:

The concurrent program normally completes in 2 minutes. During month-end, the same program took 4 hours because SQL_ID xxxx consumed most of the DB time and waited mainly on TEMP I/O.


18. Month-End SQL Deep Dive Summary Table

Use the below format for your RCA report.

SQL_IDRequest IDProgram NameRuntimePlan HashMain WaitRoot CauseAction
abc123123456Create Accounting4 hrs987654321direct path read tempTEMP spill / bad planReview stats and plan
def456123457Gather Schema Stats3 hrs123456789CPULarge stats jobRun with controlled DOP
ghi789123458AutoInvoice Import2 hrs555555555db file scattered readFull table scanStats/index review
jkl111123459Cost Manager1.5 hrs777777777enq: TX row lockBlocking transactionBusiness coordination

19. Root Cause Classification

A. Plan Regression

Evidence:

Same SQL_ID used multiple PLAN_HASH_VALUEs.
Bad plan consumed more elapsed time and buffer gets.

RCA statement:

The long runtime was caused by SQL plan regression. The SQL used a different plan hash value during month-end compared to normal execution. The new plan caused higher logical reads and elapsed time.

Corrective actions:

1. Compare good and bad execution plans.
2. Validate object statistics.
3. Use SQL Plan Management baseline if a known good plan exists.
4. Avoid blind hinting in seeded EBS code.

B. Stale or Inaccurate Statistics

Evidence:

STALE_STATS = YES
LAST_ANALYZED is old
Estimated rows and actual rows mismatch

RCA statement:

The optimizer selected an inefficient plan due to stale or inaccurate object statistics. The issue became visible during month-end because of increased transaction volume.

Corrective action example:

EXEC DBMS_STATS.GATHER_TABLE_STATS(
ownname => 'OWNER',
tabname => 'TABLE_NAME',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'FOR ALL COLUMNS SIZE AUTO',
cascade => TRUE,
degree => 8
);

C. Parallelism Not Used or Not Available

Evidence:

No rows in GV$PX_SESSION
Requested DOP greater than actual DOP
Parallel servers exhausted

RCA statement:

SQL was expected to run with parallelism but executed serially or received reduced DOP due to parallel server shortage. This increased elapsed time during month-end processing.

Corrective actions:

1. Review parallel_max_servers.
2. Review parallel_servers_target.
3. Avoid uncontrolled DOP during peak month-end.
4. Use controlled DOP only after business and DBA approval.

D. TEMP Spill

Evidence:

direct path read temp
direct path write temp
High TEMP usage
Hash join or sort operation in plan

RCA statement:

SQL spent significant time on TEMP I/O due to large sort/hash operations. This indicates memory pressure, poor cardinality estimation, or large month-end data volume.

Corrective actions:

1. Review execution plan.
2. Check PGA pressure.
3. Check TEMP tablespace usage.
4. Validate object statistics.
5. Review join order and cardinality estimates.

E. Blocking or Locking

Evidence:

enq: TX - row lock contention
blocking_session populated

RCA statement:

The concurrent request was delayed due to row lock contention from another active transaction. The database was waiting on an application-level lock rather than executing the SQL actively.

Corrective actions:

1. Identify blocker session.
2. Map blocker to EBS user/request.
3. Coordinate with business.
4. Ask user/process owner to terminate or commit if appropriate.
5. Avoid killing sessions without approval.

20. Production Handling: Should We Kill the Session?

In production, killing a session should not be the first option.

For EBS month-end, the better approach is:

1. Identify SQL_ID, SID, serial#, and request ID.
2. Check whether the request is critical.
3. Check whether it is blocking other business-critical jobs.
4. Inform business users and application owners.
5. Ask business to terminate and rerun from the application if possible.
6. Kill from database only if approved and technically safe.

This protects the business process and avoids data inconsistency.


21. Final Management RCA Format

Executive Summary

During month-end processing, multiple concurrent requests experienced abnormal runtime. The major contributors were long-running SQL statements identified through AWR, ASH, and active session analysis.

The top SQL_IDs consumed significant DB time due to one or more of the following causes:

1. SQL plan regression
2. Stale or inaccurate statistics
3. TEMP spill
4. Blocking sessions
5. Parallel server shortage
6. Increased month-end data volume
7. High concurrent batch workload

Technical Summary

AreaFinding
Impact Window30-Apr-2026 18:00 to 01-May-2026 06:00
Affected AreaMonth-end concurrent processing
Main SQL_IDsSQL_ID_1, SQL_ID_2, SQL_ID_3
Main WaitsCPU, TEMP I/O, db file scattered read, PX waits
ImpactRequest runtime increased above normal baseline
Immediate ActionSessions monitored and mapped to requests
Permanent FixStats review, plan stabilization, DOP governance, scheduling improvement

22. Sample RCA Statement

During month-end, concurrent request <REQUEST_ID> / program <PROGRAM_NAME> ran longer than normal due to SQL_ID <SQL_ID>.

The request normally completes in approximately <NORMAL_RUNTIME>, but during the issue window it took <ACTUAL_RUNTIME>.

ASH analysis showed that most of the time was spent on <WAIT_EVENT>. AWR showed plan hash value <PLAN_HASH_VALUE>. Further review confirmed that the root cause was <PLAN_REGRESSION / STALE_STATS / TEMP_SPILL / BLOCKING / PARALLEL_SHORTAGE / DATA_VOLUME>.

No database instance-level outage was observed.

Corrective action:
<ACTION>

Preventive action:
<PREVENTIVE_ACTION>

23. Proactive Month-End Apps DBA Checklist

Before Month-End

CheckAction
Top month-end programsReview last 3 months runtime
SQL plan stabilityCheck SQLs with multiple plan hash values
Object statisticsGather stats for volatile tables
TEMP capacityVerify free TEMP space
Parallel serversValidate DOP availability
Concurrent managersCheck target and actual processes
Critical requestsPrepare monitoring list
Business communicationInform users not to rerun blindly

During Month-End

CheckAction
Running requestsMonitor FND concurrent requests
Active SQL_IDsMonitor GV$SESSION and GV$SQL
Wait eventsCheck ASH and active waits
BlockingMonitor blocking sessions
TEMP usageMonitor GV$TEMPSEG_USAGE
Parallel usageMonitor GV$PX_SESSION
CPU/I/O pressureMonitor AWR/OEM/OS level load

After Month-End

DeliverableContent
RCA reportSQL_ID, request ID, root cause, evidence
Trend reportNormal runtime vs month-end runtime
Fix planStats, SPM, SQL profile, scheduling, DOP
Preventive actionMonitoring and business coordination plan

24. Final Apps DBA Action Plan

PriorityActionOwner
P1Identify top SQL_IDs by elapsed timeApps DBA
P1Map SQL_IDs to concurrent requestsApps DBA
P1Capture ASH wait profileApps DBA
P1Capture current and historical execution plansApps DBA
P1Check blocking, TEMP, and parallelismApps DBA
P2Compare against previous successful month-endApps DBA
P2Validate object statisticsDBA
P2Recommend SPM baseline or SQL profile if requiredDBA
P3Prepare month-end monitoring dashboardDBA / SRE
P3Optimize scheduling of heavy requestsApps Team / Business
P3Create safe terminate/rerun communication processDBA / Business

Conclusion

Month-end performance troubleshooting in Oracle EBS should be evidence-based. As an Apps DBA, we should not only report that a request was slow. We should clearly prove:

Which SQL_ID caused the issue?
Which request and program triggered it?
What was the wait event?
Was the plan changed?
Was parallelism used?
Was TEMP heavily used?
Was there blocking?
What is the permanent fix?

A strong Apps DBA RCA connects database evidence with business impact.

The best production approach is:

SQL_ID → Session → Request ID → Program → Wait Event → Plan → Root Cause → Corrective Action → Preventive Action

This approach helps us handle month-end issues professionally, avoid unnecessary killing of sessions, and build confidence with business users and management.