0

I have some complex queries that are usually fast (< 1 sec), but occasionally slow (> 10 sec, even minutes).

How do I track this down or identify what condition is causing it?

One theory was caching - perhaps the query is fast when the blocks cached in memory. I tried flushing both the shared pool and the buffer cache ("alter system flush buffer_cache") but the query is still fast after doing that.

Is there a way to nail down precisely what was going on during a specific execution that was bad, and which part of query execution took the longest?

asked Aug 20, 2013 at 16:45
7
  • 1
    Is the query itself "complex"? If it is, try also flushing the shared pool to force a parse on top of the data block I/O. Commented Aug 20, 2013 at 17:12
  • As one of the possible solutions - configure tiny recycle pool and assign tables used in SQL to it. Commented Aug 20, 2013 at 17:20
  • 3
    Just flushing the buffer cache may not help if the OS also has the blocks cached. The only 100% way of doing this is a machine reboot. Even that may not work if you're connected to a SAN with a separate cache. Commented Aug 20, 2013 at 17:39
  • 1
    You can flush OS cache on linux. "echo 1 > /proc/sys/vm/drop_caches". Also if you are using direct IO (filesystemio_options=setall) or ASM or RAW devices OS is not caching Oracle blocks. Commented Aug 21, 2013 at 6:37
  • Yes the query is "complex" - I flushed the shared pool as well. It still runs fast (0.5 sec or less) after flushing shared pool and cache. Yet randomly it will take 10+ sec or even a few minutes, and I can't figure out why. Disk reads are not totally insane - 2000 blocks at most. Commented Aug 25, 2013 at 1:44

2 Answers 2

1

As you're on 11gR2 and suitably licensed (make sure you check!), I would recommend taking a look at the SQL monitor. This gives a detailed breakdown of all SQL statements that took more ~5s to execute in the past 24 hours.

Using this, you can see the duration of each step in the execution plan, along with I/O and wait details. This will enable you to see why it's taking so long.

The quick SQL executions won't appear by default, but you can add the /*+ MONITOR */ hint to force them in.

You can get a graphical view of the plan using Enterprise Manager or SQL developer. You can also get a text version using SQL. Have a read of this oracle-base article for more details on using the SQL monitor.

It's possible that the reason for the differences is because the SQL is flipping between two execution plans. You can spot this by looking in the AWR to see this. To do this you'll need to find the sql_id for the statement(s) that are causing you issues. You can do this with:

select sql_id
from v$sql
where sql_text like '%your query here%'
and sql_text not like '%not this%';

To find if there's changes, have a read of Kerry Osborne's unstable plans article and look for the "awr_plan_change" script.

If you are switching between plans, you can force it to keep the good plan using SQL profiles. Again, Kerry Osborne has a detailed article on this.

Note that it's possible that the SQL you're interested in isn't stored in the AWR, as only the worst performing statements are kept. We can force details of it to be kept by "coloring" it however, as discussed by Dion Cho.

answered Aug 26, 2013 at 12:56
1

I would find the long running execution and then see what it is doing in ASH. ASH can be used to find the longest execution. To find the longest execution run

 set linesize 150
 define v_dbid=NULL;
 select &v_dbid from dual;
 col f_dbid new_value v_dbid
 select &database_id f_dbid from dual;
 select &v_dbid from dual;
 select nvl(&v_dbid,dbid) f_dbid from v$database;
 select &v_dbid from dual;
 col 1 for 99999
 col 2 for 99999
 col 3 for 9999
 col 4 for 999
 col 5 for 99
 col av for 99999
 col ct for 99999
 col mn for 999
 col av for 99999.9
 col longest_sql_exec_id for A10
 col max_run_time for A35
 WITH pivot_data AS (
 SELECT
 sql_id,
 ct,
 mxdelta mx,
 mndelta mn,
 round(avdelta) av,
 WIDTH_BUCKET(delta_in_seconds,mndelta,mxdelta+.1,5) AS bucket ,
 SUBSTR(times,12) max_run_time,
 SUBSTR(longest_sql_exec_id, 12) longest_sql_exec_id
 FROM (
 SELECT
 sql_id,
 delta_in_seconds,
 COUNT(*) OVER (PARTITION BY sql_id) ct,
 MAX(delta_in_seconds) OVER (PARTITION BY sql_id) mxdelta,
 MIN(delta_in_seconds) OVER (PARTITION BY sql_id) mndelta,
 AVG(delta_in_seconds) OVER (PARTITION BY sql_id) avdelta,
 MAX(times) OVER (PARTITION BY sql_id) times,
 MAX(longest_sql_exec_id) OVER (PARTITION BY sql_id) longest_sql_exec_id
 FROM (
 SELECT
 sql_id,
 sql_exec_id,
 MAX(delta_in_seconds) delta_in_seconds ,
 LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
 TO_CHAR(MIN(start_time),'YY-MM-DD HH24:MI:SS') || ' ' ||
 TO_CHAR(MAX(end_time),'YY-MM-DD HH24:MI:SS') times,
 LPAD(ROUND(MAX(delta_in_seconds),0),10) || ' ' ||
 TO_CHAR(MAX(sql_exec_id)) longest_sql_exec_id
 FROM ( SELECT
 sql_id,
 sql_exec_id,
 CAST(sample_time AS DATE) end_time,
 CAST(sql_exec_start AS DATE) start_time,
 ((CAST(sample_time AS DATE)) -
 (CAST(sql_exec_start AS DATE))) * (3600*24) delta_in_seconds
 FROM
 dba_hist_active_sess_history
 WHERE sql_exec_id IS NOT NULL
 and dbid=&v_dbid
 )
 GROUP BY sql_id,sql_exec_id
 )
 )
 where ct > &min_repeat_executions_filter
 and mxdelta > &min_elapsed_time
 )
 SELECT * FROM pivot_data
 PIVOT ( COUNT(*) FOR bucket IN (1,2,3,4,5))
 ORDER BY mx DESC,av DESC
 ;

output will look like

 SQL_ID CT MX MN AV MAX_RUN_TIME LONGEST_SQ 1 2 3 4 5
 ------------- ------ ---------- ---- -------- ----------------------------------- ---------- ------ ------ ----- ---- ---
 2spgk3k0f7quz 251 29607 0 546.0 11-04-12 12:11:47 11-04-12 20:25:14 16781748 247 2 0 0 2
 990m08w8xav7s 591 7681 0 52.0 11-04-13 00:39:27 11-04-13 02:47:28 16786685 587 0 0 2 2
 64dqhdkkw63fd 1083 7147 0 7.0 11-03-07 04:01:01 11-03-07 06:00:08 16777218 1082 0 0 0 1
 0bujgc94rg3fj 604 4929 0 25.0 11-04-08 10:53:34 11-04-08 12:15:43 16814628 601 1 1 0 1
 0hbv80w9ypy0n 161 4089 0 1184.0 11-03-02 04:36:10 11-04-12 23:34:18 16777290 27 116 9 6 3
 bzyny95313u12 114 2599 0 47.0 11-03-03 03:06:18 11-03-03 03:49:37 16781191 113 0 0 0 1
 ds8cz0fb8w147 161 2531 13 274.0 11-03-01 23:11:48 11-04-12 16:10:37 16777285 136 18 5 1 1
 5k7vccwjr5ahd 2653 1963 0 33.0 11-03-01 23:10:12 11-04-12 09:06:12 16778244 2623 15 8 4 3
 4d6m2q3ngjcv9 320 1701 3 485.0 11-03-01 23:10:53 11-04-10 18:01:26 16777261 92 168 50 9 1
 g5u58zpg0tuk8 97 1359 1 62.0 11-04-12 02:23:37 11-04-13 02:51:09 16777217 92 3 1 0 1
 34cgtc9xkgxny 61 1272 978 1163.0 11-03-02 10:06:24 11-03-02 10:27:36 16777250 4 4 14

find the query you want. The SQL_ID can be translated into the SQL TEXT as Chris Saxon pointed out here. Once you have your query, get the "LONGEST_SQ" which is the sql_exec_id for the longest execution of that query. You can then go into ASH and get all the activity for that query, for example if my sql_exec_id is 16777217, then

 Select
 decode(session_state,'ON CPU','ON CPU’, event)
 From
 v$active_session_history
 Where
 sql_exec_id = 16777217
 Order by sample_id

and the output will look like

 DECODE(SESSION_STATE,'ONCPU','ONCPU',EVENT)
 ------------------------------------------------
 db file parallel read
 db file sequential read
 ON CPU
 ON CPU
 ON CPU
 ON CPU

This can be extended to include where in the execution plan the execution is spending time by adding those columns to the above query which are available starting in 11g. This will give you and idea of where the query is spending it's time, like I/O, CPU, locks, or other stuff, but then the question is why? first thing to check is how many distinct execution plans the query has and see if fast and slow execution times relate to the plan. Chris Saxon has given links to finding the different execution plans.

More info on the method outlined here at http://www.oraclerealworld.com/finding-the-slowest-sql-execution-of-the-same-query/

If the plans are indeed the same, and the cursor activity doesn't point out anything strange like locks or such, then you probably want to check the bind variables. There is no reliable way to get bind variables other than tracing, but you can look at some example values using this method http://jonathanlewis.wordpress.com/2008/07/24/bind-capture/ like

 variable x varchar2(10);
 exec :x:='X';
 col f_sqlid new_value v_sqlid
 col f_sqlcn new_value v_sqlcn
 col f_binddata new_value v_binddata
 select * from dual where dummy=:x;
 select
 PREV_SQL_ID f_sqlid,
 PREV_CHILD_NUMBER f_sqlcn
 from v$session
 where sid = ( Select sid from v$mystat where rownum=1 ) ;
 select bind_data f_binddata from v$sql
 where sql_id = '&v_sqlid'
 and child_number = &v_sqlcn
 /
 SELECT * FROM TABLE ( DBMS_SQLTUNE.EXTRACT_BINDS ( '&v_binddata' )) ;
 SELECT xmltype(other_xml) AS xmlval
 FROM v$sql_plan
 WHERE
 sql_id = '&v_sqlid' AND
 child_number = &v_sqlcn AND
 other_xml IS NOT NULL
 /

PS this is pretty far out but you can trace, starting on 11, the execution of a particular SQL_ID across all users.

Example

 SQL> alter session set events 'sql_trace [707wu2umpfas7],wait=true,bind=true';

see http://blog.tanelpoder.com/2010/06/23/the-full-power-of-oracles-diagnostic-events-part-2-oradebug-doc-and-11g-improvements/

answered May 7, 2014 at 23:07

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.