Oracle Apps DBA stuff: Troubleshooting a Database Tablespace Used(%) Alert problem [ID 403264.1]

Troubleshooting a Database Tablespace Used(%) Alert problem [ID 403264.1]

--------------------------------------------------------------------------------

Modified 07-JAN-2010 Type TROUBLESHOOTING Status PUBLISHED

In this Document
Purpose
Last Review Date
Instructions for the Reader
Troubleshooting Details
Background
Determining the type of alert that the problem is related to
Investigation steps for 10g and above Locally Managed Tablespaces
Investigation steps for pre-10g tablespaces and Dictionary Managed Tablespaces
References

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 9.0.1.5
Enterprise Manager for RDBMS - Version: 10.1.0.2
Enterprise Manager Grid Control - Version: 10.1.0.2
Information in this document applies to any platform.

Purpose
This article intends to describe the steps necessary to troubleshoot a problem with a Database Tablespace Used(%) Metric in Enterprise Manager 10g (Grid Control or Database Control) that is either not raising or clearing when it should.

NOTE: For a summary of Oracle Recommended Patches For Tablespace Monitoring using the Tablespace Space Used % metric, refer to Note 849498.1
Last Review Date
January 7, 2010
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
Troubleshooting Details
Background
Beginning with the 10g Database, for Locally Managed Tablespaces, the Oracle Management Agent no longer raises its own alerts for Tablespace Used(%) metrics like it does for a 9i Database. Instead, raising/clearing of these alerts is the responsibility of the Database via its 'Server-Generated Alert System' (see NOTE 240965.1 10g NEW FEATURE on TABLESPACE ADVISORY). Enterprise Manager (Grid or DBControl) simply synchronizes itself with the state of the alert that is raised in the Database. Enterprise Manager still collects the metric data but only uses this data for historical reporting purposes, i.e the graphs in the console.

There are therefore 2 different mechanisms being used for raising and clearing Tablespace Used(%) alerts in Enterprise Manager and, before starting to troubleshoot why any given alert is not clearing or raising when expected the first step is to determine which type of alert we are dealing with.

Determining the type of alert that the problem is related to
1. If the tablespace is in a pre-10g database, follow the Investigation steps for pre-10g tablespaces and Dictionary Managed Tablespaces.
2. Run the following sql as a DBA user in sqlplus:

select extent_management
from dba_tablespaces
where tablespace_name = '&tablespace_name';

If the query returns 'LOCAL', follow the Investigation steps for 10g and above Locally Managed Tablespaces.
If the query returns 'DICTIONARY', follow the Investigation steps for pre-10g tablespaces and Dictionary Managed Tablespaces.
Investigation steps for 10g and above Locally Managed Tablespaces

1. Get the current data from the relevant database views
Perform the following step to determine if the information displayed by Enterprise Manager accurately reflects the values stored in the 10g Database. Query the database for outstanding alerts and specifically tablespace alerts by issuing the following SQL commands while logged on as the SYS user, spooling the output to a file. (This information will be very useful to Oracle Support should an SR be necessary):

-- QUERY 1: The following query shows the outstanding alerts that the RDBMS is aware of:

SELECT REASON
, METRIC_VALUE
, MESSAGE_TYPE
, TO_CHAR(CREATION_TIME,'DD-MON-YYYY HH24:MI:SS')
, HOST_ID
FROM SYS.DBA_OUTSTANDING_ALERTS;

-- QUERY 2: The following query shows the current thresholds settings for the RDBMS tablespace full metric:

SELECT METRICS_NAME
, WARNING_OPERATOR WARN_OP
, WARNING_VALUE WARN_VAL
, CRITICAL_OPERATOR CRIT_OP
, CRITICAL_VALUE CRIT_VAL
, OBJECT_TYPE OBJ_TYPE
, OBJECT_NAME OBJ_NAME
, STATUS
FROM SYS.DBA_THRESHOLDS
WHERE metrics_name LIKE ‘%Tablespace%’; -- Do not replace the string ‘Tablespace’

-- QUERY 3: The following query shows the current values for the Tablespace Used(%) metric from the perspective of the RDBMS:

SELECT TABLESPACE_NAME TBSP_NAME
, USED_SPACE
, TABLESPACE_SIZE TBSP_SIZE
, USED_PERCENT
FROM SYS.DBA_TABLESPACE_USAGE_METRICS;

2. If the output of QUERY1 shows that there is an outstanding alert in the RDBMS but this alert is not shown in Grid Control

2.1. Try a clearstate of the Agent while the agent is running. The RDBMS only attempts to inform Grid Control of an alert once. If the message is lost on the way to Grid Control for any reason the RDBMS will not try again. A clearstate tells the agents to check whether there are any tablespace full alerts, without the RDBMS resending the alert.

/bin/emctl clearstate agent
2.2 Check that the OEM_MONITOR role has been granted privileges to dequeue messages from SYS.ALERT_QUE. If this is not the case, the RDBMS will be sending the alerts to the ALERT_QUE for the Agent to pick up and send on to the OMS but the Agent will not have privileges to read from the QUE.
select grantee, dequeue_privilege from sys.queue_privileges where owner = 'SYS' and name = 'ALERT_QUE';

-- This should return:

GRANTEE DEQUEUE_PRIVILEGE
------------------------------ -----------------
OEM_MONITOR 1

-- If it does not, grant the privilege using the following SQL as the SYS user:

exec dbms_aqadm.grant_queue_privilege('dequeue','alert_que','OEM_MONITOR',false);

2.3 Check that the user specified in 'Monitoring Configuration' for the database targets (usually dbsnmp) has the role 'OEM_MONITOR' and that it is a DEFAULT role for that user (see NOTE 397680.1 Errors In Emagent.Trc: Pls-00201: Identifier 'Dbms_aqadm') .
select grantee
, default_role
from dba_role_privs
where granted_role = 'OEM_MONITOR';
2.4 Check that the Agent has registered with the Server Generated Alerts system.

select agent_name from SYSTEM.AQ$_INTERNET_AGENTS;

-- This should return at least a row that contains the hostname, agent port and sid of the database, just like the one highlighted in bold in the example output. The name of the agent contains the (possibly truncated) hostname of the Agent, the Agent port and the Database SID

AGENT_NAME
------------------------------
MYHOST_MYDOMA_3872_ORCL102
HAE_SUB
SCHEDULER$_EVENT_AGENT
SERVER_ALERT

-- If it the query results do not contain an entry for your agent/port/DB, try restarting the agent, it should re-register. If it does not, this is a good time to raise an SR with Oracle Customer Support (see step 4 below).

2.5 Check that the HOST_ID in dba_outstanding_alerts matches the MachineName property of the oracle_database target in targets.xml. If the two values do not match, this can break the chain of events that lead to the alert being raised in Grid Control. This cause is predominantly seen in RAC environments when the oracle_database targets have been discovered with the virtual hostname while the RAC database is raising alerts with the physical hostname.

If the values do not match, a workaround is to update targets.xml so that the oracle_database uses the value seen for HOST_ID in dba_outstanding_alerts for the MachineName property:

2.5.1 Update /sysman/emd/targets.xml (or //sysman/emd/targets.xml in a RAC env.), change the MachineName for the oracle_database target to the value that is used for HOST_ID in dba_outstanding_alerts

2.5.2 Reload the Agent and then perform a clearstate.

/bin/emctl reload agent
/bin/emctl clearstate agent
2.6 (RAC only). You may be experiencing a known issue (unpublished Bug 4572724) with RAC database targets where Alerts are not raised due to the rac_database target in targets.xml not having the AssocTargetInstance property.

The workaround for this issue is to go to the Monitoring Configuration page for the rac_database target in Grid Control and complete the configuration steps (even if the target is already configured) and then restart the agent.

3. If there is no outstanding alert in the RDBMS (QUERY 1 returns no relevant rows) but an alert is shown in Grid Control

Scenario 1 : Raise and Clear occuring within 1 second. Bug 5013368 SEVERITIES FOR SERVER GENERATED METRICS INCORRECTLY REJECTED

In this rare situation, documented in Bug 5013368 SEVERITIES FOR SERVER GENERATED METRICS INCORRECTLY REJECTED, the alert does not clear even though the clear message has made it into the repository. This happens when the RDBMS raises and clears the alert within one second. Because the repository tables that store the records of the raise and clear use a DATE column to store the timestamp recording when the raise and clear occurs, the raise and clear, as far as the repository is concerned, happened at the same time and therefore we cannot determine which came first. This results in the Database Instance: > All Metrics > Tablespace Space Used (%) page for the tablespace showing both the raise and the clear in the Alert History table but the Alert not actually clearing.

NOTE: You may not see the clear record by default unless you select an option under 'View Data' that covers the period of the raise and clear occurring.
Workaround: To avoid this problem from occurring, Edit the Metric via 'Metric and Policy Settings' and specify a 'Number of Occurrences' greater than 1.

NOTE: There is currently no way of clearing such alerts without manual intervention under the direction of support. Please raise an SR via My Oracle Support to progress such issues.
Scenario 2 : Clear record not making it to the repository. Bug 7046253 ORPHANED PUSH ALERTS (E.G TABLESPACE FULL) IN GRID CONTROL

The OMS has not received the message from the RBDMS, via the Agent, to clear the alert. The RDBMS sends the message only once and therefore the alert will not clear in Enterprise Manager without manual intervention if, for some reason, it has not managed to get to the OMS.

Workaround: Use emctl clearstate agent, while the agent is running, to resynchronize the RDBMS and the Agent.

Unpublished Bug 4222570 - fixed in 10.2 agents, allows the RDBMS and Agent/OMS to re-synchronize if they become out of synch. The fix to Bug 4222570 allows an 'emctl clearstate agent' to force a resynch and clear the alert.

Patch 4222570 is available for 10.1.0.4.0 agents.

Unpublished Bug 4923126 fixes a similar issue specific to RAC databases. This Bug is fixed in both 11.1 and 10.2.0.3.0 (and above) Agents

Patch 4923126 is available on limited platform/versions (the link will take you to My Oracle Support, where you can check the availability for your platform/Agent version).

NOTE: Unpublished Bug 5913097, now fixed and backported to selected platform/versions of the Agent, deals with the fact that an Agent clearstate will only resolve such issues if there is a record of the clear in dba_alert_history. Unfortunately, this information is timed out of dba_alert_history based on the AWR retention period. An agent that has PATCH 5913097 no longer requires a record of the clear to be in dba_alert_history for an Agent clearstate to clear the alert.

Use the following query to return relevant rows from dba_alert_history:

SELECT REASON
, TO_CHAR(CREATION_TIME,'dd-mon-yyy hh24:mi:ss') ctime
, TO_CHAR(TIME_SUGGESTED,'dd-mon-yyy hh24:mi:ss') rtime
FROM DBA_ALERT_HISTORY
WHERE OBJECT_TYPE = 'TABLESPACE'
AND UPPER(RESOLUTION) = 'CLEARED'
ORDER BY CREATION_TIME ASC
A record of a clear for a Tablespace Used(%) alert will have a REASON with the new threshold that is below the Warning or Critical threshold, e.g:

Tablespace [SYSTEM] is [25 percent] full
If it is clear that a tablespace full alert is not clearing for any of the 2 scenarios, above, check the current status of the relevant Bug. If there is no fix currently available and/or you need a patch for your specific platform/agent version, raise an SR with Oracle Support for further assistance.

4. If the view dba_tablespace_usage_metrics shows that the USED_PERCENT has crossed the threshold defined for your tablespace in DBA_THRESHOLDS but there is no alert record in dba_outstanding_alerts (and therefore, by definition, Grid Control) you may be hitting the following Bug:

Bug 7462407 TABLESPACE ALERT NOT TRIGGERED FOR 10.2.0.4.0 RAC DATABASE- Results in some tablespace alerts not being raised by the Server Generated Alert system after another tablespace (a tablespace other than the tablespace that the alert should have been raised for) has been dropped.
This Bug is fixed in version 12.1 and Patch 7462407 is available from My Oracle Support for selected platforms. If the patch is not available for your platform and database version, please raise an SR via My Oracle Support to request a backport.
NOTE 1: Despite the Abstract for this Bug, this is NOT specific to RAC databases.
NOTE 2: Bug 8898153 TABLESPACE ALERT NOT TRIGGERED FOR 10.2.0.4.0 RAC DATABASE has been raised to investigate this problem further as it seems that the original fix did not fix all instances of this problem. To verify whether you might be hitting this problem, issue the following SQL on the target database:

select ts#, name from v$tablespace order by ts#;
If you are hitting the problem, the problem tablespace will have a TS# above a 'gap' in the set of TS#. Alerts will be successfully raised and cleared for tablespaces below the 'gap', e.g:

SQL> select ts#, name from v$tablespace order by ts#;

TS# NAME
---------- ------------------------------
0 SYSTEM
1 UNDOTBS1
2 SYSAUX
3 TEMP
4 DATA01
6 DATA03
7 DATA04

In the example above, tablespaces SYSTEM, UNDOTBS1, SYSAUX, TEMP and DATA01 would raise and clear alerts successfully but tablespaces DATA03 and DATA04 would not.

Unpublished Bug 5245039 APPSST GSI 10G: TS SPACE USED % ALERT NOT TRIGGERED WHEN USED SPACE IS >320GB - Results in tablespaces that are over 320GB in size not triggering tablespace used (%) alerts.
This Bug is fixed in version 10.2.0.4.0 of the database and has been backported to 10.2.0.3.0. Patch 5245039 is available from My Oracle Support for selected platforms.

5. If the view dba_tablespace_usage_metrics (QUERY 3) does not return the value you expect for USED_PERCENT and, hence, no alert is being raised or an alert is raised when it should not be, you may be hitting one of the following Bugs:

Unpublished Bug 7660028 TS FULL (%) ALERT GENERATION DOES NOT CONSIDER AVAILABLE DISK FREE SPACE. Basically, this Bug deals with the fact that an unpatched Database Server does not take into account the amount of space that is on the filesystem when calculating the Tablespace used (%) but only considers how much the datafiles that are part of the tablespace can theoretically extend (assuming there is space on the filesystem).

For example (assuming the filesystem is used for nothing else other than the data file):

Tablespace TEST_TBS has 1 datafile with 2.5GB of space allocated to it
Tablespace TEST_TBS has 1.25GB of space used
The size of the filesystem that the datafile is created on is 5GB
- 2.5GB of space is available after the tablespace is created

If the Max File Size of the datafile is set to 5GB, USED_PERCENT will be reported correctly as 25%
If the Max File Size of the datafile is set to 10GB, USED_PERCENT will be reported incorrectly as 12.5%. This is because the calculation of USED_PERCENT assumes that the datafile can extend to 10GB even though there is only 2.5GB of disk space available and therefore the tablespace can only extend to 5GB. The 'Actual' USED_PERCENT should still be 25% as the tablespace can still only extend to 5GB.

Once Patch 7660028 is applied to the Database Server, USED_PERCENT is calculated taking into account the space available on the filesystem.

Bug 7250921 USED_PERCENT INCORRECT ON DBA_TABLESPACE_USAGE_METRICS FOR
PARTITIONED OBJECTS. This Bug deals with inaccurate values for USED_PERCENT, believed to be specific to tablespaces that have partitioned objects. - Duplicate of <<4685811>> USED PERCENT OF RBS TABLESPACE DOES NOT DECREASE PROPERLY WHEN SHRINK OCCURS

Bug 6629893 INCONSISTENCIES IN TABLESPACE USAGE CALCUALTION./ NOT GETTING TABLESPACE ALERTS. Patch 6629893 is available to fix this issue.

Unpublished Bug 5527591 TABLESPACE USAGE PERCENTAGE DIFFERS DEPENDING ON TABLES(VIEWS) USED. Fixed in 11g and backports available via Patch 5527591

Bug 4566763 VALUE FOR TS USED (%) IS INCONSISTENT FROM OBJECT SEARCH PAGE AND METRIC PAGE

Unpublished Bug 5549540 DBA_TABLESPACE_USAGE_METRICS USAGE SHOULD NOT INCLUD OBJECTS IN RECYCLE BIN. Fixed in 11g and backports available via Patch 5549540

Bug 4472437 USED PERCENT OF SYSTEM TABLESPACE DOES NOT INCREASE WITHOUT SHUTDOWN INSTANCE.. Fixed in 11g and backports available via PATCH 4472437

Bug 6759910 DBA_TABLESPACE_USAGE_METRICS USED_PERCENT SHOWS OVER 100%
6. If you have disabled the monitoring of a Tablespace and an Alert has not been cleared

See Note 392268.1 'How to - Exclude a Tablespace From The Tablespace Used (%)
Metric' for known issues with the disabling of tablespace monitoring'
7. If you have dropped a tablespace and an Alert is still showing in either dba_outstanding_alerts, Grid Control, or both

See Note 740340.1 'Metric Alerts are being Raised for Tablespaces Which no
Longer Exist / Dropped in the Database'

8. If the results from QUERY 2 show different thresholds than those that are reported in Grid Control

Cases have been seen where the thresholds configured in Grid Control have become out of sync. with the thresholds in the database targets's DBA_THRESHOLDS. When this is the case, alerts are always generated based on the values in DBA_THRESHOLDS and this can result in very confusing results in the Grid Control console with alerts raised even though the thresholds do not appear to have been crossed.

Bug 7163538 TABLESPACE SPACE USED THRESHOLDS NOT SYNCHRONIZED BETWEEN AGENT AND DATABASE, fixed in 10.2.0.5.0, provides the agent with a mechanism to automatically re-sync thresholds between the Agent and the target database at a configurable interval.

9. If none of the resolutions listed in steps 2 - 8 resolve your issue, raise an SR with Oracle Support

Upload the results of the queries in Step 1 and any evidence you collected while following this Note to the SR. Without this evidence, support will need to verify that all these known issues have been ruled out.

Upload an emagent.trc that shows the activity before and after issuing the command 'emctl clearstate agent'. Wait at least 15 minutes after issuing the clearstate command before uploading the files Because the emagent.trc files get cycled, this time period may be in an emagent.trc.n file. To be safe, simply upload all files that start with emagent.trc in /sysman/log (//sysman/log for RAC)
Set the property tracelevel.recvlets.aq=DEBUG and tracelevel.recvlets=DEBUG in /sysman/config/emd.properties (//sysman/config/emd.properties for RAC) then reload the agent to get more detailed trace information.
Investigation steps for pre-10g tablespaces and Dictionary Managed Tablespaces
1. Check the value of the problemTbsp(Tablespaces Full) metric for the tablespace using the Metrics Browser

Follow Note 276350.1 'How to Enable the Metrics Browser/Agent Browser for the Oracle Management Agent' to enable the Metric Browser for the agent.
Open the Metric Browser in a browser by going to the following URL:

http|https://:/emd/browser/main

Click on the database target and then on the problemTbsp(Tablespaces Full) link

2. If the value shown for the tablespace for pctUsed(Tablespace Space Used (%)) is incorrect for the tablespace concerned

The problem is Agent side. This means that the script that is used to return the Tablespace Space Used(%) values is not working correctly (or is not working as you expect it to at least) and should be investigated further by Oracle Customer Support. Examine the list of known issues below and if you cannot find and issue related to your problem, log an SR with Oracle Customer Support to investigate further. Upload a screenshot of the Metrics Browser to the SR. Unless there is an obvious error, explain why you believe the values to be incorrect. Also, upload the output of the following statements:

/bin/emctl status agent
/bin/emctl status agent_scheduler

3. If the value shown for the tablespace for pctUsed(Tablespace Space Used (%)) is correct for the tablespace concerned
The problem is related to the process of scheduling, collecting, uploading and processing the metric data and could be related to a problem either on the Agent side or on the OMS side.

3.1 Try a clearstate of the Agent

/bin/emctl clearstate agent

NOTE: Do not confuse a clearstate with a clean start of an Agent. A clearstate simply tells the Agent to re-evaluate all metrics. A clean start involves removing all state files (among other things) from the Agent and is potentially destructive. Do not perform a clean start of an Agent without a complete understanding/justification of why doing so will resolve your issue
3.2 If the alert does not raise/clear within 10 minutes of running a clearstate and the problem is not described in the list of known issues, below, raise an SR with Oracle Customer Support to investigate the issue further. Upload the output of the following command and refer to this Note to explain what you have done so far to troubleshoot the issue.

/bin/emctl status agent
4. Known Issues

Note 317261.1 Problem - Tablespace Metrics - Incorrect values for Tablespace full metric for 8i and 9i databases are firing Critical and Warning alerts.

References
BUG:4472437 - USED PERCENT OF SYSTEM TABLESPACE DOES NOT INCREASE WITHOUT SHUTDOWN INSTANCE.
BUG:4566763 - VALUE FOR TS USED (%) IS INCONSISTENT FROM OBJECT SEARCH PAGE AND METRIC PAGE
BUG:4685811 - USED PERCENT OF RBS TABLESPACE DOES NOT DECREASE PROPERLY WHEN SHRINK OCCURS
BUG:5013368 - SEVERITIES FOR SERVER GENERATED METRICS INCORRECTLY REJECTED
BUG:5734870 - SERVER ALERT THRESHOLDS NOT REMOVED WHEN TABLESPACE IS DROPPED
BUG:6629893 - INCONSISTENCIES IN TABLESPACE USAGE CALCUALTION./ NOT GETTING TABLESPACE ALERTS
BUG:6759910 - DBA_TABLESPACE_USAGE_METRICS USED_PERCENT SHOWS OVER 100%
BUG:7046253 - ORPHANED PUSH ALERTS (E.G TABLESPACE FULL) IN GRID CONTROL
BUG:7250921 - USED_PERCENT INCORRECT ON DBA_TABLESPACE_USAGE_METRICS FOR PARTITIONED OBJECTS
BUG:7593670 - DBA_TABLESPACE_USAGE_METRICS INCORRECT, NO ALERTS FIRE.
BUG:8898153 - TABLESPACE ALERT NOT TRIGGERED PROPERLY
NOTE:240965.1 - 10g NEW FEATURE on TABLESPACE ADVISORY
NOTE:317261.1 - Problem - Tablespace Metrics - Incorrect values for Tablespace full metric for 8i and 9i databases are firing Critical and Warning alerts.
NOTE:392268.1 - How to - Exclude a Tablespace From The Tablespace Used (%) Metric
NOTE:397680.1 - Errors In Emagent.Trc: Pls-00201: Identifier 'Dbms_aqadm'
NOTE:740340.1 - Metric Alerts are Raised by the Database for Tablespaces Which no Longer Exist / Dropped
NOTE:849498.1 - Oracle Recommended Patches For Tablespace Monitoring using the Tablespace Space Used % metric

Oracle Apps DBA stuff

Thursday, May 6, 2010

Troubleshooting a Database Tablespace Used(%) Alert problem [ID 403264.1]

No comments:

Post a Comment

Search This Blog

Blog Archive