SQL_ID: <SQL_ID> Incident: Month-End Performance Degradation Bad Plan Hash: <BAD_PLAN_HASH> Good Plan Hash (if known): <GOOD_PLAN_HASH>
1. Purpose
This runbook provides a standardized, production-safe procedure to diagnose and resolve SQL plan regression for the identified SQL statement during month-end processing.
2. Scope
Applies when this SQL_ID shows significantly higher elapsed time due to plan change from good to bad plan hash value.
3. Prerequisites
- Access to the database as a privileged user (with SELECT on AWR views, DBA_* views).
- AWR snapshots available for the affected period.
- SQL Tuning Advisor and SQL Plan Management licenses (Enterprise Edition).
4. Step-by-Step Diagnosis
Step 4.1: Confirm Multiple Plans Exist
SELECT DISTINCT plan_hash_value,
MIN(begin_interval_time) first_seen,
MAX(begin_interval_time) last_seen,
COUNT(*) snaps
FROM dba_hist_sqlstat
WHERE sql_id = '<SQL_ID>'
GROUP BY plan_hash_value
ORDER BY first_seen;Step 4.2: Compare Performance by Plan
SELECT
plan_hash_value,
SUM(executions_delta) executions,
ROUND(SUM(elapsed_time_delta)/1e6 / NULLIF(SUM(executions_delta),0), 2) "AVG_ELA_SEC",
ROUND(SUM(cpu_time_delta)/1e6 / NULLIF(SUM(executions_delta),0), 2) "AVG_CPU_SEC",
ROUND(SUM(buffer_gets_delta) / NULLIF(SUM(executions_delta),0), 0) "AVG_BUF_GETS",
ROUND(SUM(disk_reads_delta) / NULLIF(SUM(executions_delta),0), 0) "AVG_DISK_READS"
FROM dba_hist_sqlstat s
JOIN dba_hist_snapshot sn USING (snap_id, instance_number)
WHERE sql_id = '<SQL_ID>'
AND executions_delta > 0
GROUP BY plan_hash_value
ORDER BY "AVG_ELA_SEC" DESC;Decision: If <BAD_PLAN_HASH> has significantly higher AVG_ELA_SEC (typically ≥ 3x), proceed.
Step 4.3: Compare Execution Plans
-- Current / Bad Plan
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<SQL_ID>', 0, 'ADVANCED ALLSTATS LAST'));
-- Historical Good Plan (AWR)
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_AWR('<SQL_ID>', <GOOD_PLAN_HASH>, NULL, 'ADVANCED'));Look for differences in: Full Table Scan vs Index, Join order/method, Cardinality estimates.
Step 4.4: Check Stale Statistics
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
SELECT owner, table_name, partition_name, last_analyzed, stale_stats
FROM dba_tab_statistics
WHERE stale_stats = 'YES' OR last_analyzed IS NULL;Step 4.5: Check Bind Sensitivity & Child Cursors
SELECT child_number, plan_hash_value, is_bind_sensitive, is_bind_aware, executions,
elapsed_time/1e6 avg_ela_sec
FROM v$sql
WHERE sql_id = '<SQL_ID>';
SELECT * FROM v$sql_shared_cursor WHERE sql_id = '<SQL_ID>';5. Corrective Actions (Production-Safe Order)
| Priority | Action | Command / Steps | Risk Level |
|---|---|---|---|
| 1 (Recommended) | SQL Plan Baseline (fixes plan stability) | ```sql | |
| 2 | SQL Profile (if no good plan in cache) | Run SQL Tuning Advisor → Accept Profile | Low-Medium |
| 3 | Gather fresh statistics (maintenance window) | EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname=>'OWNER', tabname=>'TABLE', degree=>8); | Medium |
Do NOT use OPTIMIZER_GATHER_STATS hints or drop plans in production without testing.
6. Verification Steps
After applying fix:
- Re-run performance comparison query (Step 4.2).
- Confirm new executions are using <GOOD_PLAN_HASH>.
- Monitor real-time:SQL
SELECT sql_id, plan_hash_value, executions, elapsed_time/1e6 FROM v$sql WHERE sql_id = '<SQL_ID>'; - Check AWR report for next snapshot.
Success Criteria: Average elapsed time returns to normal/good plan levels.
7. Rollback / Backout Plan
- Drop Baseline:SQL
EXEC DBMS_SPM.DROP_SQL_PLAN_BASELINE(plan_name => '<PLAN_NAME>'); - Drop SQL Profile:SQL
EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE(profile_name => '<PROFILE_NAME>');
8. Root Cause Analysis (RCA) Template for Management
Root Cause: The SQL with ID <SQL_ID> experienced a plan regression during month-end from Plan Hash <GOOD_PLAN_HASH> (efficient) to <BAD_PLAN_HASH> (sub-optimal). This was triggered by [stale statistics / bind peeking issue / data volume skew / cardinality misestimation].
Business Impact: Elapsed time increased by ~X times, causing month-end batch delay of XX minutes/hours.
Corrective Action: Implemented SQL Plan Baseline (or Profile) to enforce the good plan. Statistics gathering scheduled.
Prevention Measures:
- Enable SQL Plan Management for critical SQLs.
- Review statistics gathering strategy for month-end tables.
- Monitor plan changes via AWR / custom alerts.
9. Escalation Path
- Level 1: DBA performing analysis
- Level 2: Senior DBA / Performance Architect
- Level 3: Application Owner + Architecture Team
10. References
- Oracle Documentation: DBMS_SPM, DBMS_SQLTUNE, DBMS_XPLAN
- AWR Views: DBA_HIST_SQLSTAT, DBA_HIST_SNAPSHOT
No comments:
Post a Comment