Understanding Oracle Histograms

Histograms are a feature of the Cost Based Optimizer (CBO) that helps to know the data distribution in a column. Histograms should be created in columns with highly skew data and are included in the filter clause (WHERE) of a SQL statement.
Histograms play a very important role in the creation of the execution plan and we must be careful in the method used to gather statistics in the DB objects.
Let me show you how histograms work, for this test I have created a table T_HIST with some data in it:
 SQL> CREATE TABLE T_HIST AS SELECT LEVEL C1, mod(round(dbms_random.value(1,100000)),254) c2,mod(round(dbms_random.value(1,100000)),254) c3, ‘C’ C4 from dual connect by level < 529101;

SQL> INSERT INTO T_HIST SELECT LEVEL C1, mod(round(dbms_random.value(1,100000)),254) c2,
mod(round(dbms_random.value(1,100000)),254) c3, ‘P’ from dual connect by level < 301;

300 rows created.

SQL> INSERT INTO T_HIST SELECT LEVEL C1, mod(round(dbms_random.value(1,100000)),254) c2,
mod(round(dbms_random.value(1,100000)),254) c3, ‘R’ from dual connect by level < 301;

300 rows created.

SQL> INSERT INTO T_HIST SELECT LEVEL C1, mod(round(dbms_random.value(1,100000)),254) c2,
mod(round(dbms_random.value(1,100000)),254) c3, ‘T’ from dual connect by level < 301;

300 rows created.

SQL> INSERT INTO T_HIST SELECT LEVEL C1, mod(round(dbms_random.value(1,100000)),254) c2,
  2  mod(round(dbms_random.value(1,100000)),254) c3, ‘A’ from dual connect by level < 500001;

500000 rows created.

SQL> COMMIT;

Commit complete.

SQL>

We will now create an index in the column C4:
SQL> CREATE INDEX T_HIST_IDX1 ON T_HIST(C4);

Index created.

SQL>

Let us see how the data is distributed inside column C4:
SQL> SELECT C4,COUNT(*) FROM T_HIST GROUP BY C4 ORDER BY C4;
C   COUNT(*)
– ———-
A     500000
C     529100
P        300
R        300
T        300
You can see that we have 5 distinct values in column C4 (A, C, P, R & T) and the number of values that correspond to each. Let us take value ‘R’ as example, if we query the table the expected execution plan is to use the Index created, because we need only 300 rows of the 1030000 rows total. Let’s query:
SQL> explain plan for select /* test23 */ *
  2  from T_HIST
  3  where      C4 = ‘R’
  4  order by
  5      C1
  6  ;
Explained.
SQL>
SET LINESIZE 130
SET PAGESIZE 0
SELECT *
FROM   TABLE(DBMS_XPLAN.DISPLAY);  2    3    4    5    6
Explained.

SQL> SQL> SQL> SQL>   2
Plan hash value: 3146893614

——————————————————————————————–
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————————–
|   0 | SELECT STATEMENT             |             |    30 |  1260 |   387   (1)| 00:00:05 |
|   1 |  SORT ORDER BY               |             |    30 |  1260 |   387   (1)| 00:00:05 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_HIST      |    30 |  1260 |   386   (0)| 00:00:05 |
|*  3 |    INDEX RANGE SCAN          | T_HIST_IDX1 |  3607 |       |   375   (0)| 00:00:05 |
——————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

   3 – access(“C4″=’R’)

Note
—–
   – dynamic sampling used for this statement (level=2)

What happened? Execution plan is saying that our query will return only 30 rows! obviously this is totally wrong; also look at the estimated elapsed time (5 sec). Ok, we are being unfair to the CBO as we are not giving all the information required, so let’s gather statistics but without histograms:
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘ALFRED’, tabname=>’T_HIST’, estimate_percent=>dbms_stats.auto_sample_size,method_opt=>’for all columns size 1′);
PL/SQL procedure successfully completed.
SQL> explain plan for select /* test23 */ *
  2  from T_HIST
  3  where      C4 = ‘R’
  4  order by
  5      C1
  6  ;
Explained.
SQL>
SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 2911980760
————————————————————————————-
| Id  | Operation          | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
————————————————————————————-
|   0 | SELECT STATEMENT   |        |   206K|  3017K|       |  1832   (2)| 00:00:22 |
|   1 |  SORT ORDER BY     |        |   206K|  3017K|  5672K|  1832   (2)| 00:00:22 |
|*  2 |   TABLE ACCESS FULL| T_HIST |   206K|  3017K|       |   774   (2)| 00:00:10 |
————————————————————————————-
Predicate Information (identified by operation id):
—————————————————
   2 – filter(“C4″=’R’)
What happened? Now the CBO decided to choose a Full Table Scan, it also states that will return 206k rows and the estimated elapsed time is 22 sec. That’s because we used AUTO_SAMPLE_SIZE so Oracle we decide how many rows to sample and for sure values of ‘A’ & ‘C’ represent more than 99% of the samples. But how can we tell the CBO that our data distribution is skewed? that’s where histograms enter into play. Let’s now gather statistics using histogram for column C4:
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘ALFRED’, tabname=>’T_HIST’, estimate_percent=>dbms_stats.auto_sample_size,method_opt=>’for columns C4 size 254′);
PL/SQL procedure successfully completed.
SQL> explain plan for select /* test23 */ *
  2  from T_HIST
  3  where      C4 = ‘R’
  4  order by
  5      C1
  6  ;
Explained.
SQL>
SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3146893614
——————————————————————————————–
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————————–
|   0 | SELECT STATEMENT             |             |   941 | 14115 |     8  (13)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |   941 | 14115 |     8  (13)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_HIST      |   941 | 14115 |     7   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_HIST_IDX1 |   941 |       |     4   (0)| 00:00:01 |
——————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   3 – access(“C4″=’R’)
Now is using the Index again, states that will return 941 rows but look how the cost decreased and the estimated elapsed time to 1 sec! Even if the CBO wasn’t able to calculate the real number of rows being returned the cost was fixed, in order to have the most accurate execution plan we must increase the sample size. Let us make a test with 100% of the samples:
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => ‘ALFRED’, tabname=>’T_HIST’, estimate_percent=>100,method_opt=>’for columns C4 size 254′);
PL/SQL procedure successfully completed.
SQL> explain plan for select /* test23 */ *
  2  from T_HIST
  3  where      C4 = ‘R’
  4  order by
  5      C1
  6  ;
Explained.
SQL>
SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> SELECT *
  2  FROM   TABLE(DBMS_XPLAN.DISPLAY);
Plan hash value: 3146893614
——————————————————————————————–
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
——————————————————————————————–
|   0 | SELECT STATEMENT             |             |   300 |  4500 |     5  (20)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |   300 |  4500 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_HIST      |   300 |  4500 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T_HIST_IDX1 |   300 |       |     3   (0)| 00:00:01 |
——————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
   3 – access(“C4″=’R’)
Now is totally accurate! Right? The execution plan generated by the CBO is using the Index, the estimated rows are 300 which is correct and the cost is even better. This is an example of a very small table, but is really an option to gather statistics with estimate of 100% to a big table?, I don’t think so; that’s why AUTO_SAMPLE_SIZE can be a good balance between sample size and accuracy.
Column C4 now is having a frequency histogram because is having less than 254 distinct values (NDV), only 5; that means that frequency histogram has the restriction of 254 buckets (254 NDV’s), seriously?. Let test it, we will gather statistics to all columns of the table using AUTO_SAMPLE_SIZE and AUTO for histograms:
SQL> exec dbms_stats.gather_table_stats(ownname=>’ALFRED’,tabname=>’T_HIST’, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE AUTO’);
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_BUCKETS, HISTOGRAM from user_tab_columns where table_name = ‘T_HIST’;
C1                                   529100  .00000189           1 NONE
C2                                      254 .003937008           1 NONE
C3                                      254 4.8544E-07         254 FREQUENCY
C4                                        5 4.8544E-07           5 FREQUENCY
What happened?, well column C3 was also candidate for a histogram because it’s density (skewed) but look at the number of buckets is 254; that means that we are on the edge of the limit in order to have a histogram in this column. Don’t believe me? let’s make a quick test, we will insert a new distinct value into this column, gather statistics again and see what happen:
SQL> insert into T_HIST values (100001,42, 99999999999,’B’);
1 row created.
SQL> COMMIT;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(ownname=>’ALFRED’,tabname=>’T_HIST’, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE AUTO’);
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_BUCKETS, HISTOGRAM from user_tab_columns where table_name = ‘T_HIST’;
C1                                   529100  .00000189           1 NONE
C2                                      254 .003937008           1 NONE
C3                                      255 .003921569           1 NONE
C4                                        6 4.8544E-07           6 FREQUENCY
Can you see how the number of distinct values (NVD) changed? now we can’t have a histogram in that column, really?; well we can force it but look how low is our density. Let’s try to force it and specify to the gather stats that we want all columns with maximum of 254 buckets:
SQL> exec dbms_stats.gather_table_stats(ownname=>’ALFRED’,tabname=>’T_HIST’, estimate_percent=>null, method_opt=>’FOR ALL COLUMNS SIZE 254′);
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME, NUM_DISTINCT, DENSITY, NUM_BUCKETS, HISTOGRAM from user_tab_columns where table_name = ‘T_HIST’;
C1                                   529100 1.9186E-06         254 HEIGHT BALANCED
C2                                      254 4.8544E-07         254 FREQUENCY
C3                                      255 .003935563         254 HEIGHT BALANCED
C4                                        6 4.8544E-07           6 FREQUENCY
Look how it works? it can create frequency histogram on columns with less than 254 NDV’s and height balanced histogram in columns with more than 254 NDV’s. Also note the first example, we used size of 1 (1 bucket) that means no histogram because all values are inside only one bucket.
This is a basic example of how histograms work, honestly I was having a lot of doubts before making this test and some of them are clarified reading Jonathan Lewis new post in allthinngsoracle.com, however there’s a lot of path to walk through and even difficult as 12c have 2 more types of histograms to help CBO.

Thanks,
Alfredo  

Why the execution plan changed? Case study

Overview

At some point in time the execution plan of the SQL_ID etp65bryqtd2d changed. The new execution plan drops on performance from an Avg. of 0.01 seconds per execution to an Avg. of 1.5 seconds per execution.
This increase on elapsed time per execution provoke that the application can’t process work orders fast enough, the work queue started to drastically grow leading the application team to escalate a severity 1 issue.

Analysis

Execution plan changes

The execution plan changes only when a parse is performed, this because the _optim_bind_peeking parameter is on default value “TRUE”. A hard parse is required due to cursor invalidation in the shared pool; a cursor can be invalidated for some reasons but the most happening is, object change (Index rebuild, DBMS_STATS, etc.).

Rolling Cursor Invalidation

Starting with Oracle10g cursors are marked for rolling invalidation instead of marked INVALID immediately. On the next execution of the query the Oracle server will generate a random number between 0 and the value of the _optimizer_invalidation_period parameter, which has a default value of 18000, and the cursor will remain valid for this number of seconds. Upon every following execution Oracle will check if this random selected timeout has expired. If that is the case then the cursor will be hard parsed again.
Things will change without notice! Cursor invalidation has changed in the past and the reason for this change is that previously all dependent cursors were invalidated immediately after gathering new object statistics. This massive invalidation might cause a serious degradation in performance right after statistics gathering due to a high number of hard parses. Rolling cursor invalidation spreads the invalidation of cursors out over a longer period of time thereby avoiding the performance degradation caused by hard parsing the invalidated cursors.

Reference: Metalink note:557661.1

Bind variable peeking

The query optimizer peeks at the values of user-defined bind variables on the first invocation of a cursor. This feature lets the optimizer determine the selectivity of any WHERE clause condition, as well as if literals have been used instead of bind variables. On subsequent invocations of the cursor, no peeking takes place, and the cursor is shared, based on the standard cursor-sharing criteria, even if subsequent invocations use different bind values.
In this case we can clearly see that the elapsed time per execution is not changing because of the bind peeking; when Oracle peeks a bad execution plan, all elapsed times are bad.

CBO trace event 10053

Behind the scenes, what is the CBO doing when it comes to how it comes up with an execution plan? This is where the 10053 trace event comes to play. Other tools or settings show us WHAT the CBO comes up with; the 10053 setting tells us HOW the CBO came to its decision (the final execution plan).

conn / as sysdba
oradebug setmypid
oradebug unlimit
oradebug event 10053 trace name context forever, level 1
…your statement here…
oradebug event 10053 trace name context off
oradebug tracefile_name

We have performed this CBO trace when the query was performing bad and when it was performing well. With this method we have 2 trace files to compare and find out what is making the CBO to take such bad plan.
Notice that this method will produce a trace file only in the hard parse stage. So, in order to force the CBO to hard parse the SQL we have run it with a comment hint.

variable N1 VARCHAR2(128)
exec :N1 :=’125′;
select /* comment*/…

After comparing both trace files we can clearly see when the CBO chose the good plan:
***********************
Best so far: Table#: 1  cost: 7.3522  card: 3973.0000  bytes: 218515
             Table#: 7  cost: 16.2850  card: 9.0178  bytes: 612
             Table#: 2  cost: 21.6884  card: 33.7888  bytes: 3094
             Table#: 5  cost: 48.9215  card: 3565.0298  bytes: 406410
             Table#: 4  cost: 1475.7155  card: 3565.4846  bytes: 556140
             Table#: 0  cost: 1483.2886  card: 3559.8976  bytes: 644360
             Table#: 6  cost: 2195.7103  card: 3599.4022  bytes: 770186
             Table#: 3  cost: 2197.9000  card: 3599.4022  bytes: 820572
***********************
In the other hand, the trace of the bad plan is having a huge cost:
***********************
Best so far: Table#: 1  cost: 7.3510  card: 3928.0000  bytes: 216040
             Table#: 7  cost: 16.2836  card: 8.9798  bytes: 612
             Table#: 2  cost: 21.6870  card: 32.6678  bytes: 3003
             Table#: 5  cost: 153.8561  card: 20168.2755  bytes: 2299152
             Table#: 4  cost: 8225.5481  card: 20170.8485  bytes: 3146676
             Table#: 0  cost: 8233.2391  card: 20139.2414  bytes: 3645159
             Table#: 6  cost: 12263.4248  card: 20362.7290  bytes: 4357682
             Table#: 3  cost: 12266.6610  card: 20362.7290  bytes: 4642764
***********************
In the trace files we can clearly see that cost of table#5 messed up our total cost. Why is that?
***************
Now joining: TABLE5[TL2]#5
***************
NL Join
  Outer table: Card: 32.67  Cost: 21.69  Resp: 21.69  Degree: 1  Bytes: 91
  Inner table: TABLE5  Alias: TL2
  Access Path: TableScan
Index: TABLE5_IDX3
    resc_io: 20.00  resc_cpu: 360839
    ix_sel: 3.9216e-04  ix_sel_with_filters: 3.9216e-04
    NL Join: Cost: 153.86  Resp: 153.86  Degree: 1
      Cost_io: 152.55  Cost_cpu: 18392646
      Resp_io: 152.55  Resp_cpu: 18392646
****** finished trying bitmap/domain indexes ******
  Best NL cost: 153.86
          resc: 153.86 resc_io: 152.55 resc_cpu: 18392646
          resp: 153.86 resp_io: 152.55 resp_cpu: 18392646
Join Card:  20168.28 = outer (32.67) * inner (1574304.00) * sel (3.9216e-04)
Join Card – Rounded: 20168 Computed: 20168.28
Index selectivity on TABLE5_IDX3 is very low (3.9216e-04) compared with the good plan (6.7020e-05).

Index selectivity

After looking into the Index selectivity, the question arises; why the Index selectivity is low?
B*TREE Indexes improve the performance of queries that select a small percentage of rows from a table. As a general guideline, we should create indexes on tables that are often queried for less than 15% of the table’s rows. This value may be higher in situations where all data can be retrieved from an index, or where the indexed columns can be used for joining to other tables.
The ratio of the number of distinct values in the indexed column / columns to the number of records in the table represents the selectivity of an index. The ideal selectivity is 1. Such selectivity can be reached only by unique indexes on NOT NULL columns.

Selectivity = Distinct Values / Total Number Rows

With this measure we can correlate the importance of the statistics in the objects. If we gather statistics only on the table, the total number of rows will change; leading to drastically change the Index selectivity. In the same manner, if we only gather statistics on the Index, the number of distinct values will change and the selectivity will dramatically change.
Apart of that, the accuracy of the statistics will also play an important role. The accurate the statistics, the accurate the Index selectivity.

Suggestions – How to fix this issue?

The suggestions came from the percept that Index selectivity is playing a bad game on this particular SQL_ID.
        As table and its indexes have a close relationship on how the CBO calculates the cost, we must gather statistics on the table and its indexes in the same time. This will avoid the Index selectivity to drastically change.
       When an index is rebuilt, we must gather statistics of the table too.
       Review the accuracy of the statistics. In some cases AUTO_SAMPLE_SIZE is not giving a good sample.

Some test on this SQL_ID have proven that gathering statistics to the TABLE5 table and its indexes with an estimate percent of 50%, plus invalidating the cursor immediately solved the issue.
The CBO was able to pick the good execution plan because the index selectivity was good enough.

Thanks,
Alfred