Friday, April 4, 2014

Oracle. Statistics gathering issue.


While looking on AWR Oracle report and trying to understand why calls to DB (Oracle 11g)  are so slow. I found that one query is expensive in matter of  time consumption.

Here is this query:
  select distinct  
  evt_map_tbl.UNIQUE_EVENT_MAPPING_ID as UNIQUE1_43_  
   , evt_map_tbl.ACCOUNT_FK as ACCOUNT9_43_  
   , evt_map_tbl.EVENT_MAPPING_ID as EVENT2_43_  
   , evt_map_tbl.EXT_ID_TYPE_CODE_ID as EXT3_43_  
   , evt_map_tbl.LAST_MOD_DATE as LAST4_43_  
   , evt_map_tbl.OL_VERSION as OL5_43_  
  , evt_map_tbl.VALID_FROM as VALID6_43_  
   , evt_map_tbl.VALID_TO as VALID7_43_  
   , evt_map_tbl.VALUE as VALUE43_  
  from BASTST1.EVENT_MAPPING_TBL evt_map_tbl, BASTST1.EVENT_MAPPING_ASSOC_TBL evt_map_assoc,BASTST1.SOLD_COMPONENT_TBL sold_comp, BASTST1.SOLD_PRODUCT_TBL sold_prod, BASTST1.ACCOUNT_TBL acc_tbl  
  where evt_map_tbl.EVENT_MAPPING_ID=evt_map_assoc.EVENT_MAPPING_REF  
   and evt_map_assoc.SOLD_COMPONENT_FK=sold_comp.SOLD_COMPONENT_ID  
   and sold_comp.SOLD_PRODUCT_FK=sold_prod.SOLD_PRODUCT_ID  
  and sold_prod.ACCOUNT_FK=acc_tbl.ACCOUNT_ID  
   and acc_tbl.ACCOUNT_ID=:acc  
  and evt_map_tbl.VALID_FROM<=:from_dt and evt_map_tbl.VALID_TO>:to_dt;  


and execution plan is:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                         |     1 |   289 |    20   (5)| 00:00:01 |
|   1 |  SORT UNIQUE                     |                         |     1 |   289 |    20   (5)| 00:00:01 |
|   2 |   NESTED LOOPS                   |                         |       |       |            |          |
|   3 |    NESTED LOOPS                  |                         |     1 |   289 |    19   (0)| 00:00:01 |
|   4 |     MERGE JOIN CARTESIAN         |                         |  1414 |   363K|    19   (0)| 00:00:01 |
|   5 |      NESTED LOOPS                |                         |       |       |            |          |
|   6 |       NESTED LOOPS               |                         |    15 |  3555 |    19   (0)| 00:00:01 |
|*  7 |        TABLE ACCESS FULL         | EVENT_MAPPING_TBL       |    15 |  3165 |    19   (0)| 00:00:01 |
|*  8 |        INDEX RANGE SCAN          | EMASS_MAPREF_IDX        |     1 |       |     0   (0)| 00:00:01 |
|   9 |       TABLE ACCESS BY INDEX ROWID| EVENT_MAPPING_ASSOC_TBL |     1 |    26 |     0   (0)| 00:00:01 |
|  10 |      BUFFER SORT                 |                         |    97 |  2522 |    19   (0)| 00:00:01 |
|  11 |       TABLE ACCESS BY INDEX ROWID| SOLD_PRODUCT_TBL        |    97 |  2522 |     0   (0)| 00:00:01 |
|* 12 |        INDEX RANGE SCAN          | SP_ACCFK_IDX            |    39 |       |     0   (0)| 00:00:01 |
|* 13 |     INDEX RANGE SCAN             | SC_SPFK_IDX             |     1 |       |     0   (0)| 00:00:01 |
|* 14 |    TABLE ACCESS BY INDEX ROWID   | SOLD_COMPONENT_TBL      |     1 |    26 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------


The most expensive step is MERGE JOIN CARTESIAN

Tom Kyte says that  MERGE JOIN CARTESIAN could be also efficient is some cases
CBO (cost based optimizer) should know some statistics values to use most efficient execution plan. 
This object DBA_TAB_STATISTICS displays optimizer statistics for all tables in the database :

 SQL> select TABLE_NAME , nvl(NUM_ROWS,0), nvl(BLOCKS,0), nvl(AVG_ROW_LEN,0) , nvl(SAMPLE_SIZE,0) ,LAST_ANALYZED from DBA_TAB_STATISTICS  
  2 where owner = '&tab_owner'  
  3 and table_name in ( 'EVENT_MAPPING_TBL', 'EVENT_MAPPING_ASSOC_TBL','SOLD_COMPONENT_TBL','SOLD_PRODUCT_TBL','ACCOUNT_TBL');  



TABLE_NAME                     NVL(NUM_ROWS,0) NVL(BLOCKS,0) NVL(AVG_ROW_LEN,0) NVL(SAMPLE_SIZE,0) LAST_ANALYZED
------------------------------ --------------- ------------- ------------------ ------------------ -------------------
ACCOUNT_TBL                                  0             0                  0                  0
EVENT_MAPPING_ASSOC_TBL                      0             0                  0                  0
EVENT_MAPPING_TBL                            0             0                  0                  0
SOLD_COMPONENT_TBL                           0             0                  0                  0
SOLD_PRODUCT_TBL                             0             0                  0                  0

Statistics wasn't gathered at all. It means that CBO has no information about data. And uses plans suitable for lowest amount of rows.

Statistics can be gathered using DBMS_STATS package.

 SQL> exec dbms_stats.gather_schema_stats( 'BASTST1');  

This procedure takes a lot of parameters, but mandatory only first of them, the name of the scheme for which statistics should be collected.

Checking one more time  DBA_TAB_STATISTICS.  Now statistics is collected, and CBO is aware of such parameters as NUM_ROWS, BLOCKS, AVG_ROW_LEN.

TABLE_NAME                     NVL(NUM_ROWS,0) NVL(BLOCKS,0) NVL(AVG_ROW_LEN,0) NVL(SAMPLE_SIZE,0) LAST_ANALYZED
------------------------------ --------------- ------------- ------------------ ------------------ -------------------
ACCOUNT_TBL                               5967           244                236               5967 2014-04-03 17:09:50
EVENT_MAPPING_ASSOC_TBL                   5966            20                 14               5966 2014-04-03 17:11:33
EVENT_MAPPING_TBL                         5966            65                 91               5966 2014-04-03 17:11:52
SOLD_COMPONENT_TBL                        9571           244                123               9571 2014-04-03 17:12:03

SOLD_PRODUCT_TBL                          9571            80                 49               9571 2014-04-03 17:12:15

Finally for target problematic query was used another , more efficient execution plan.

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                              |     1 |   115 |    17   (6)| 00:00:01 |
|   1 |  SORT UNIQUE                    |                              |     1 |   115 |    17   (6)| 00:00:01 |
|   2 |   NESTED LOOPS                  |                              |     1 |   115 |    16   (0)| 00:00:01 |
|   3 |    NESTED LOOPS                 |                              |     2 |    48 |    12   (0)| 00:00:01 |
|   4 |     NESTED LOOPS                |                              |     3 |    48 |     9   (0)| 00:00:01 |
|   5 |      TABLE ACCESS BY INDEX ROWID| SOLD_PRODUCT_TBL             |     3 |    24 |     3   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | SP_ACCFK_IDX                 |     3 |       |     1   (0)| 00:00:01 |
|   7 |      TABLE ACCESS BY INDEX ROWID| SOLD_COMPONENT_TBL           |     1 |     8 |     2   (0)| 00:00:01 |
|*  8 |       INDEX RANGE SCAN          | SC_SPFK_IDX                  |     1 |       |     1   (0)| 00:00:01 |
|*  9 |     INDEX RANGE SCAN            | EMASS_SCEMREF_IDX            |     1 |     8 |     1   (0)| 00:00:01 |
|* 10 |    TABLE ACCESS BY INDEX ROWID  | EVENT_MAPPING_TBL            |     1 |    91 |     2   (0)| 00:00:01 |
|* 11 |     INDEX RANGE SCAN            | EVTMAPP_EVTID_VALID_FROM_IDX |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------

Without expensive MERGE JOIN CARTESIAN. Very clean and efficient plan.

I was curious about why statistics wasn't collected at all, because without statistics CBO is only blind tool good for nothing. I found some information regarding this topic and soon here will be new post regarding differences in collecting statistics by default JOBS in Oracle 10g and Oracle 11g.




Intro

Hi dears.

Here I will write something that I feel usefull to share regarding my IT experience.
I'm on the way of becoming better in following spheres , languages:

C++/Python
Oracle PL/SQL, Optimization
Linux/RedHat
Regullar expression
Arduino/ AVR projects.

And I hope that something that was hard for me to find will help you on your own way.
Feel free to ask questions and argue if you know better approach.

Stay tuned!