I have some complex queries that are usually fast (< 1 sec), but occasionally slow (> 10 sec, even minutes).
How do I track this down or identify what condition is causing it?
One theory was caching - perhaps the query is fast when the blocks cached in memory. I tried flushing both the shared pool and the buffer cache ("alter system flush buffer_cache") but the query is still fast after doing that.
Is there a way to nail down precisely what was going on during a specific execution that was bad, and which part of query execution took the longest?
-
1Is the query itself "complex"? If it is, try also flushing the shared pool to force a parse on top of the data block I/O.Mat– Mat2013年08月20日 17:12:04 +00:00Commented Aug 20, 2013 at 17:12
-
As one of the possible solutions - configure tiny recycle pool and assign tables used in SQL to it.Mindaugas Riauba– Mindaugas Riauba2013年08月20日 17:20:00 +00:00Commented Aug 20, 2013 at 17:20
-
3Just flushing the buffer cache may not help if the OS also has the blocks cached. The only 100% way of doing this is a machine reboot. Even that may not work if you're connected to a SAN with a separate cache.Philᵀᴹ– Philᵀᴹ2013年08月20日 17:39:00 +00:00Commented Aug 20, 2013 at 17:39
-
1You can flush OS cache on linux. "echo 1 > /proc/sys/vm/drop_caches". Also if you are using direct IO (filesystemio_options=setall) or ASM or RAW devices OS is not caching Oracle blocks.Mindaugas Riauba– Mindaugas Riauba2013年08月21日 06:37:23 +00:00Commented Aug 21, 2013 at 6:37
-
Yes the query is "complex" - I flushed the shared pool as well. It still runs fast (0.5 sec or less) after flushing shared pool and cache. Yet randomly it will take 10+ sec or even a few minutes, and I can't figure out why. Disk reads are not totally insane - 2000 blocks at most.wrschneider– wrschneider2013年08月25日 01:44:20 +00:00Commented Aug 25, 2013 at 1:44
2 Answers 2
As you're on 11gR2 and suitably licensed (make sure you check!), I would recommend taking a look at the SQL monitor. This gives a detailed breakdown of all SQL statements that took more ~5s to execute in the past 24 hours.
Using this, you can see the duration of each step in the execution plan, along with I/O and wait details. This will enable you to see why it's taking so long.
The quick SQL executions won't appear by default, but you can add the /*+ MONITOR */
hint to force them in.
You can get a graphical view of the plan using Enterprise Manager or SQL developer. You can also get a text version using SQL. Have a read of this oracle-base article for more details on using the SQL monitor.
It's possible that the reason for the differences is because the SQL is flipping between two execution plans. You can spot this by looking in the AWR to see this. To do this you'll need to find the sql_id for the statement(s) that are causing you issues. You can do this with:
select sql_id
from v$sql
where sql_text like '%your query here%'
and sql_text not like '%not this%';
To find if there's changes, have a read of Kerry Osborne's unstable plans article and look for the "awr_plan_change" script.
If you are switching between plans, you can force it to keep the good plan using SQL profiles. Again, Kerry Osborne has a detailed article on this.
Note that it's possible that the SQL you're interested in isn't stored in the AWR, as only the worst performing statements are kept. We can force details of it to be kept by "coloring" it however, as discussed by Dion Cho.
I would find the long running execution and then see what it is doing in ASH. ASH can be used to find the longest execution. To find the longest execution run
set linesize 150
define v_dbid=NULL;
select &v_dbid from dual;
col f_dbid new_value v_dbid
select &database_id f_dbid from dual;
select &v_dbid from dual;
select nvl(&v_dbid,dbid) f_dbid from v$database;
select &v_dbid from dual;
col 1 for 99999
col 2 for 99999
col 3 for 9999
col 4 for 999
col 5 for 99
col av for 99999
col ct for 99999
col mn for 999
col av for 99999.9
col longest_sql_exec_id for A10
col max_run_time for A35
WITH pivot_data AS (
SELECT
sql_id,
ct,
mxdelta mx,
mndelta mn,
round(avdelta) av,
WIDTH_BUCKET(delta_in_seconds,mndelta,mxdelta+.1,5) AS bucket ,
SUBSTR(times,12) max_run_time,
SUBSTR(longest_sql_exec_id, 12) longest_sql_exec_id
FROM (
SELECT
sql_id,
delta_in_seconds,
COUNT(*) OVER (PARTITION BY sql_id) ct,
MAX(delta_in_seconds) OVER (PARTITION BY sql_id) mxdelta,
MIN(delta_in_seconds) OVER (PARTITION BY sql_id) mndelta,
AVG(delta_in_seconds) OVER (PARTITION BY sql_id) avdelta,
MAX(times) OVER (PARTITION BY sql_id) times,
MAX(longest_sql_exec_id) OVER (PARTITION BY sql_id) longest_sql_exec_id
FROM (
SELECT
sql_id,
sql_exec_id,
MAX(delta_in_seconds) delta_in_seconds ,
LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
TO_CHAR(MIN(start_time),'YY-MM-DD HH24:MI:SS') || ' ' ||
TO_CHAR(MAX(end_time),'YY-MM-DD HH24:MI:SS') times,
LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
TO_CHAR(MAX(sql_exec_id)) longest_sql_exec_id
FROM ( SELECT
sql_id,
sql_exec_id,
CAST(sample_time AS DATE) end_time,
CAST(sql_exec_start AS DATE) start_time,
((CAST(sample_time AS DATE)) -
(CAST(sql_exec_start AS DATE))) * (3600*24) delta_in_seconds
FROM
dba_hist_active_sess_history
WHERE sql_exec_id IS NOT NULL
and dbid=&v_dbid
)
GROUP BY sql_id,sql_exec_id
)
)
where ct > &min_repeat_executions_filter
and mxdelta > &min_elapsed_time
)
SELECT * FROM pivot_data
PIVOT ( COUNT(*) FOR bucket IN (1,2,3,4,5))
ORDER BY mx DESC,av DESC
;
output will look like
SQL_ID CT MX MN AV MAX_RUN_TIME LONGEST_SQ 1 2 3 4 5
------------- ------ ---------- ---- -------- ----------------------------------- ---------- ------ ------ ----- ---- ---
2spgk3k0f7quz 251 29607 0 546.0 11-04-12 12:11:47 11-04-12 20:25:14 16781748 247 2 0 0 2
990m08w8xav7s 591 7681 0 52.0 11-04-13 00:39:27 11-04-13 02:47:28 16786685 587 0 0 2 2
64dqhdkkw63fd 1083 7147 0 7.0 11-03-07 04:01:01 11-03-07 06:00:08 16777218 1082 0 0 0 1
0bujgc94rg3fj 604 4929 0 25.0 11-04-08 10:53:34 11-04-08 12:15:43 16814628 601 1 1 0 1
0hbv80w9ypy0n 161 4089 0 1184.0 11-03-02 04:36:10 11-04-12 23:34:18 16777290 27 116 9 6 3
bzyny95313u12 114 2599 0 47.0 11-03-03 03:06:18 11-03-03 03:49:37 16781191 113 0 0 0 1
ds8cz0fb8w147 161 2531 13 274.0 11-03-01 23:11:48 11-04-12 16:10:37 16777285 136 18 5 1 1
5k7vccwjr5ahd 2653 1963 0 33.0 11-03-01 23:10:12 11-04-12 09:06:12 16778244 2623 15 8 4 3
4d6m2q3ngjcv9 320 1701 3 485.0 11-03-01 23:10:53 11-04-10 18:01:26 16777261 92 168 50 9 1
g5u58zpg0tuk8 97 1359 1 62.0 11-04-12 02:23:37 11-04-13 02:51:09 16777217 92 3 1 0 1
34cgtc9xkgxny 61 1272 978 1163.0 11-03-02 10:06:24 11-03-02 10:27:36 16777250 4 4 14
find the query you want. The SQL_ID can be translated into the SQL TEXT as Chris Saxon pointed out here. Once you have your query, get the "LONGEST_SQ" which is the sql_exec_id for the longest execution of that query. You can then go into ASH and get all the activity for that query, for example if my sql_exec_id is 16777217, then
Select
decode(session_state,'ON CPU','ON CPU’, event)
From
v$active_session_history
Where
sql_exec_id = 16777217
Order by sample_id
and the output will look like
DECODE(SESSION_STATE,'ONCPU','ONCPU',EVENT)
------------------------------------------------
db file parallel read
db file sequential read
ON CPU
ON CPU
ON CPU
ON CPU
This can be extended to include where in the execution plan the execution is spending time by adding those columns to the above query which are available starting in 11g. This will give you and idea of where the query is spending it's time, like I/O, CPU, locks, or other stuff, but then the question is why? first thing to check is how many distinct execution plans the query has and see if fast and slow execution times relate to the plan. Chris Saxon has given links to finding the different execution plans.
More info on the method outlined here at http://www.oraclerealworld.com/finding-the-slowest-sql-execution-of-the-same-query/
If the plans are indeed the same, and the cursor activity doesn't point out anything strange like locks or such, then you probably want to check the bind variables. There is no reliable way to get bind variables other than tracing, but you can look at some example values using this method http://jonathanlewis.wordpress.com/2008/07/24/bind-capture/ like
variable x varchar2(10);
exec :x:='X';
col f_sqlid new_value v_sqlid
col f_sqlcn new_value v_sqlcn
col f_binddata new_value v_binddata
select * from dual where dummy=:x;
select
PREV_SQL_ID f_sqlid,
PREV_CHILD_NUMBER f_sqlcn
from v$session
where sid = ( Select sid from v$mystat where rownum=1 ) ;
select bind_data f_binddata from v$sql
where sql_id = '&v_sqlid'
and child_number = &v_sqlcn
/
SELECT * FROM TABLE ( DBMS_SQLTUNE.EXTRACT_BINDS ( '&v_binddata' )) ;
SELECT xmltype(other_xml) AS xmlval
FROM v$sql_plan
WHERE
sql_id = '&v_sqlid' AND
child_number = &v_sqlcn AND
other_xml IS NOT NULL
/
PS this is pretty far out but you can trace, starting on 11, the execution of a particular SQL_ID across all users.
Example
SQL> alter session set events 'sql_trace [707wu2umpfas7],wait=true,bind=true';