Optimizer Statistics Gathering - pending and history

Submitted by fpachot on
Blog article:

How do you manage when you need to gather statistics on some tables in a critical environment? Some queries are too long because of stale statistics. But other queries on the same tables are ok. You cannot leave the inital problem without fixing it. Adding hints or SQL Profiles for the identified queries is not the right solution when you identified that stale statistics are the problem. But you want to reduce the risk of regression on other queries at maximum. This is where two features are very useful, Statistics History and Pending Statistics, which are actually implemented as one feature: versionning the object statistics. here is a small demo.

Demo table

I create a table for the demo. The CTAS gathers statistics (12c online statistics gathering) with one row and then I insert more rows:


10:33:56 SQL> create table DEMO as select rownum n from dual;
Table DEMO created.
10:33:56 SQL> insert into DEMO select rownum n from xmltable('1 to 41');
41 rows inserted.
10:33:56 SQL> commit;
Commit complete.

The estimations are stale: estimates 1 row (E-Rows) but 42 actual rows (A-Rows)


10:33:56 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:33:57 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |   
--------------------------------------------------------------   

Pending Statistics

Here we are: I want to gather statistics on this table. But I will lower all risks by not publishing them immediately. Current statistics preferences are set to PUBLISH=TRUE:


10:33:58 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   TRUE     
                                          

I set it to FALSE:


10:33:59 SQL> exec dbms_stats.set_table_prefs('DEMO','DEMO','publish','false');

PL/SQL procedure successfully completed.

10:34:00 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   FALSE  
                                            

I'm now gathering stats as I want to:


10:34:01 SQL> exec dbms_stats.gather_table_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

Test Pending Statistics

They are not published. But to test my queries with those new stats, I can set my session to use pending statistics:


10:34:02 SQL> alter session set optimizer_use_pending_statistics=true;
Session altered.

Running my query again, I can see the good estimations (E-Rows=A-Rows)


10:34:03 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:34:04 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |     42 |     42 |   
--------------------------------------------------------------   

The published statistics still show 1 row:


10:34:05 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
         1 12-SEP-18 10.33.56.000000000 AM   FALSE            
                                  

But I can query the pending ones before publishing them:


10:34:05 SQL> c/dba_tab_statistics/dba_tab_pending_stats
  1* select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_pending_stats where owner='DEMO' and table_name in ('DEMO');
10:34:05 SQL> /

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
        42 12-SEP-18 10.34.01.000000000 AM   FALSE          
                                    

I've finished my test with pending statistics:


10:34:05 SQL> alter session set optimizer_use_pending_statistics=false;
Session altered.

Note that if you have Real Application Testing, you can use SQL Performance Analyzer to test the pending statistics on a whole SQL Tuning Set representing the critical queries of your application. Of course, the more you test there, the better it is.

Delete Pending Statistics

Now let's say that my test shows that the new statistics are not good, I can simply delete the pending statistics:


10:34:06 SQL> exec dbms_stats.delete_pending_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

Then all queries are still using the previous statistics:


10:34:07 SQL> show parameter pending
NAME                             TYPE    VALUE
-------------------------------- ------- -----
optimizer_use_pending_statistics boolean FALSE

10:34:07 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 

10:34:08 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |   
--------------------------------------------------------------   

Accept Pending Statistics

Now I'll show the second case where my tests show that the new statistics gathering is ok. I gather statistics again:


10:34:09 SQL> exec dbms_stats.gather_table_stats('DEMO','DEMO');
PL/SQL procedure successfully completed.

10:34:09 SQL> alter session set optimizer_use_pending_statistics=true;
Session altered.

10:34:11 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 


10:34:12 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |     42 |     42 |   
--------------------------------------------------------------   
                                                                 
10:34:12 SQL> alter session set optimizer_use_pending_statistics=false;
Session altered.

When I'm ok with the new statistics I can publish them so that other sessions can see them. As doing this in production is probably a fix for a critical problem, I want the effects to take immediately, invalidating all cursors:


10:34:13 SQL> exec dbms_stats.publish_pending_stats('DEMO','DEMO',no_invalidate=>false);
PL/SQL procedure successfully completed.

The default NO_INVALIDATE value is probably to avoid in those cases because you want to see the side effects, if any, as soon as possible. Not within a random window of 5 hours later where you have left the office. I set back the table preference to PUBLISH=TRUE and check that the new statistics are visible in DBA_TAB_STATISTICS (and no more in DBA_TAB_PENDING_STATS):


10:34:14 SQL> exec dbms_stats.set_table_prefs('DEMO','DEMO','publish','true');
PL/SQL procedure successfully completed.

10:34:15 SQL> select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_statistics where owner='DEMO' and table_name in ('DEMO');

  NUM_ROWS CAST(LAST_ANALYZEDASTIMESTAMP)    DBMS_STATS.GET_PREFS('PUBLISH',OWNER,TABLE_NAME)   
  -------- ------------------------------    ------------------------------------------------   
        42 12-SEP-18 10.34.09.000000000 AM   TRUE                                               


10:34:15 SQL> c/dba_tab_statistics/dba_tab_pending_stats
  1* select num_rows,cast(last_analyzed as timestamp),dbms_stats.get_prefs('PUBLISH',owner,table_name) from dba_tab_pending_stats where owner='DEMO' and table_name in ('DEMO');
10:34:15 SQL> /

no rows selected

Report Differences

Then what if a citical regression is observed later? I still have the possibility to revert to the old statistics. First I can check in detail what has changed:


10:34:16 SQL> select report from table(dbms_stats.diff_table_stats_in_history('DEMO','DEMO',sysdate-1,sysdate,0));

REPORT
------

###############################################################################

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE         : DEMO
OWNER         : DEMO
SOURCE A      : Statistics as of 11-SEP-18 10.34.16.000000 AM EUROPE/ZURICH
SOURCE B      : Statistics as of 12-SEP-18 10.34.16.000000 AM EUROPE/ZURICH
PCTTHRESHOLD  : 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME                  TYP SRC ROWS       BLOCKS     ROWLEN     SAMPSIZE
...............................................................................

DEMO                        T   A   1          4          3          1
                                B   42         8          3          42
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:
.............................

COLUMN_NAME     SRC NDV     DENSITY    HIST NULLS   LEN  MIN   MAX   SAMPSIZ
...............................................................................

N               A   1       1          NO   0       3    C102  C102  1
                B   41      .024390243 NO   0       3    C102  C12A  42
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


NO DIFFERENCE IN INDEX / (SUB)PARTITION STATISTICS
###############################################################################

Restore Previous Statistics

If nothing is obvious and the regression is more critical than the original problem, I still have the possibility to revert back to the old statistics:


10:34:17 SQL> exec dbms_stats.restore_table_stats('DEMO','DEMO',sysdate-1,no_invalidate=>false);
PL/SQL procedure successfully completed.

Again, invalidating all cursors immediately is probably required as I solve a critical problem here. Immediately, the same query uses the old statistics:


10:34:17 SQL> select /*+ gather_plan_statistics */ count(*) from DEMO;

  COUNT(*) 
  -------- 
        42 


10:34:17 SQL> select * from table(dbms_xplan.display_cursor(format=>'basic +rows +rowstats last'));

PLAN_TABLE_OUTPUT                                                
-----------------                                                
EXPLAINED SQL STATEMENT:                                         
------------------------                                         
select /*+ gather_plan_statistics */ count(*) from DEMO          
                                                                 
Plan hash value: 2180342005                                      
                                                                 
--------------------------------------------------------------   
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   
--------------------------------------------------------------   
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |   
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |   
|   2 |   TABLE ACCESS FULL| DEMO |      1 |      1 |     42 |
--------------------------------------------------------------   

If I want to see what happened recently on this table, I can query the history of operations (you can replace my ugly regexp_replace with XQuery):


10:34:18 SQL> select end_time,end_time-start_time,operation,target,regexp_replace(regexp_replace(notes,'" val="','=>'),'(||)',' '),status from DBA_OPTSTAT_OPERATIONS where regexp_like(target,'"?'||'DEMO'||'"?."?'||'DEMO'||'"?') order by end_time desc fetch first 10 rows only;

END_TIME                                 END_TIME-START_TIME   OPERATION             TARGET          REGEXP_REPLACE(REGEXP_REPLACE(NOTES,'"VAL="','=>'),'(||)','')                                                                                                                                                                                                                                         STATUS      
--------                                 -------------------   ---------             ------          ----------------------------------------------------------------------------------------------                                                                                                                                                                                                                                         ------      
12-SEP-18 10.34.17.718800000 AM +02:00   +00 00:00:00.017215   restore_table_stats   "DEMO"."DEMO"     as_of_timestamp=>09-11-2018 10:34:17  force=>FALSE  no_invalidate=>FALSE  ownname=>DEMO  restore_cluster_index=>FALSE  tabname=>DEMO                                                                                                                                                                                                 COMPLETED   
12-SEP-18 10.34.13.262234000 AM +02:00   +00 00:00:00.010021   restore_table_stats   "DEMO"."DEMO"     as_of_timestamp=>11-30-3000 01:00:00  force=>FALSE  no_invalidate=>FALSE  ownname=>DEMO  restore_cluster_index=>FALSE  tabname=>DEMO                                                                                                                                                                                                 COMPLETED   
12-SEP-18 10.34.09.974873000 AM +02:00   +00 00:00:00.032513   gather_table_stats    "DEMO"."DEMO"     block_sample=>FALSE  cascade=>NULL  concurrent=>FALSE  degree=>NULL  estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE  force=>FALSE  granularity=>AUTO  method_opt=>FOR ALL COLUMNS SIZE AUTO  no_invalidate=>NULL  ownname=>DEMO  partname=>  reporting_mode=>FALSE  statid=>  statown=>  stattab=>  stattype=>DATA  tabname=>DEMO     COMPLETED   
12-SEP-18 10.34.01.194735000 AM +02:00   +00 00:00:00.052087   gather_table_stats    "DEMO"."DEMO"     block_sample=>FALSE  cascade=>NULL  concurrent=>FALSE  degree=>NULL  estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE  force=>FALSE  granularity=>AUTO  method_opt=>FOR ALL COLUMNS SIZE AUTO  no_invalidate=>NULL  ownname=>DEMO  partname=>  reporting_mode=>FALSE  statid=>  statown=>  stattab=>  stattype=>DATA  tabname=>DEMO     COMPLETED   

We can see here that the publishing of pending stats was actually a restore of stats as of Nov 30th of Year 3000. This is probably because the pending status is hardcoded as a date in the future. Does that mean that all pending stats will become autonomously published at that time? I don't think we have to worry about Y3K bugs for the moment...

Here is the full receipe I've given to an application owner who needs to gather statistics on his tables on a highly critical database. Then he has all the info to limit the risks. My recommendation is to prepare this fallback scenario before doing any change, and test it as I did, on a test environment, in order to be ready to react on any unexpected side effect. Be careful, the pending statsitics do not work correctly with system statistics and can have very nasty side effects (Bug 21326597), but restoring from history is possible.

 


The comments are closed on this post, but do not hesitate to give feedback, comment or questions on Twitter (@FranckPachot)

Disclaimer

The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines

 

Blogroll