Friday, May 23, 2008

Automatic statistics gathering during Index Creation and Rebuilds

Dear all,

While testing dynamic sampling in Oracle 10g, I came to learn a 10g new feature (very late though).

My test case for dynamic sampling goes like this:

(a) Create table with data (b) Create an index, and (c) Execute the query and note the execution plan
SQL> Create table t as select * from scott.emp;

Table created.

SQL> Create index t_idx on t(empno);

Index created.

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    87 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    87 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Yes, the optimizer did dynamic sampling and picked up the index plan as the optimal execution plan. Now, let me delete the statistics and re-run the same SQL.

SQL> Exec dbms_stats.delete_table_stats(user, 't');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    87 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Oops, optimizer did consider dynamic sampling but this time chose a Full Table Scan. How’ come the same optimizer picked index access plan on the first run?

Ok, let me do it again and this time being very careful.

(a) Create table with data
SQL> Drop table t purge;

Table dropped.

SQL> Create table t as select * from scott.emp;

Table created.

SQL> Select num_rows, last_analyzed from user_tables where table_name = 'T';

  NUM_ROWS LAST_ANALYZED
---------- --------------


SQL>

(b) Create an index, and

SQL> Create index t_idx on t(empno);

Index created.

SQL> column index_name format a10
SQL> set line 10000

SQL> select blevel, leaf_blocks, distinct_keys, clustering_factor, 
  2  num_rows, last_analyzed, user_stats, global_stats
  3  from user_indexes
  4  where index_name = 'T_IDX';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED        USE GLO
---------- ----------- ------------- ----------------- ---------- -------------------- --- ---
         0           1            14                 1         14 05-May-2008 10:14:48 NO  NO

SQL> 

Oh! I did not gather statistics.

(c) Execute the query and note the execution plan

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 1020776977

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    87 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    87 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL> Set autotrace off

Now it’s clear why optimizer finds Index Access Path to be best plan. Let’s continue and delete the statistics and then execute the query again:

SQL> Exec dbms_stats.delete_table_stats(user, 't');

PL/SQL procedure successfully completed.

SQL> select blevel, leaf_blocks, distinct_keys, clustering_factor, 
  2  num_rows, last_analyzed, user_stats, global_stats
  3  from user_indexes
  4  where index_name = 'T_IDX';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR   NUM_ROWS LAST_ANALYZED        USE GLO
---------- ----------- ------------- ----------------- ---------- -------------------- --- ---
                                                                                       NO  NO

SQL> set autotrace traceonly explain
SQL> Select * from t where empno = 7788;

Execution Plan
----------------------------------------------------------
Plan hash value: 2153619298

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    87 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"=7788)

Note
-----
   - dynamic sampling used for this statement

SQL>  Set autotrace off

This time the optimizer did dynamic sampling and FTS seems to be less expensive.

In Oracle 10g, when an index is created, Oracle automatically gathers statistics on the newly created index. This is also true in case of index rebuild. Because of this reason, optimizer picked Index Path to be the best plan.

References:

Oracle® Database SQL Reference 10g Release 2 (10.2)

Regards

No comments: