Importance of Statistics

​Download a PDF of this Article

 

 

Statistics are a key part of getting solid performance. In this article we will go over the basics of the statistics and various best practices related to Statistics. We will go over various frequently asked questions like when to update statistics and difference between sync and async update of statistics. We will also discuss the pros and cons of the statistics update.

Let us do a quick exercise with your production server before even beginning the article.

  1. 1. Launch SQL Server Management Studio.
    2. Connect to your SQL Server Instance.
    3. Right click on your main production database, and click on Properties.
    4. On the Select a page pane, click on Options.
    5. Record the values for the following options (note the values with be either
  2.     TRUE or FALSE):
    • Auto Create Statistics
    • Auto Update Statistics
    • Auto Update Statistics Asynchronously

 

Figure1.png 

 

Answer the following questions based on the values you previously recorded in Step 5.

Note:  It is strongly recommended not to use “It Depends”.

  1. 1) On your server, what is the significance of keeping the current statistics
  2.     state?
    2) Who changed the statistics status to what they are now?
    3) Do you know how your database will behave if you change the options?

Statistics will be able to help us understand the questions above.

Statistics are heart and blood vain of the SQL Server Engine. Without Statistics, the SQL Server Engine cannot decide the most optimal execution plan for the query. Statistics are used for SELECT, INSERT, UPDATE and DELETE operations.

You can check the statistics on any table using following command.Listing1.png 

Figure2.png 

 

You can also get the similar information from joining sys.stats with the sys.stats_columns view. Statistics catalog views provide much more information than earlier sp_helpstats. Now you can see whether the Index was created by a user or whether it was automatically created by the system and other options. Here is the query for the using  catalog views.Listing2.png

 

To look further down in the statistics, you can run DBCC SHOW_STATISTICS to see the details of the statistics. Let us run show statistics on one of the index retrieved from earlier query of sp_helpstats.Listing3.png

 

If you run the above query, the results are shown in the figure below. The query will provide details of various aspects of the query. 

 

Figure3.png 

 

There are three result sets generated from executing DBCC SHOW_STATISTICS. The first result set provides various information regarding when last statistics were last updated. It also shows how much data is sampled to create statistics as well few other granular details about statistics.

The second result set of the statistics shows the columns involved in the creating statistics and its density. The all density value represents the distinct values per column prefix. In other words density is 1 / distinct values.

The third result set of the statistics shows the histogram of the statistics. Based on distribution and data type of the column on which statistics were created, it displays row level aggregated details for the table. This histogram is very important for SQL Server Engine to figure out what kind of execution plan it will use. Based on the data distribution in the histogram, it will decide the optimal query plan. There will be no more than 200 steps in the histogram.

Let us try to understand one of the rows of the result set. Looking at the third row in the first column there is key value of “Alexander”. Second column RANGE_ROWS has 28 value, which means between key value “Alexander” and previous range key value “Adams” there are 28 total values (note that in this range key values themselves are excluded). Third column EQ_ROWS which means there are total of 123 rows equal to key value of “Alexander”. The fourth suggests that between key values “Adam” and “Alexander” there are total of 15 distinct values. The fifth column AVG_RANGE_ROWS is simple division of DISTINCT_RANGE_ROWS/ RANGE_ROWS.

Let’s now review our original question:

Should keep Auto Create Statistics and Auto Update Statistics settings true/on?

Yes!

Now let us try to answer this question without using the word “It depends”. If we keep Auto Create Statistics true, if SQL Server needs new statistics, it will create it and update it if the considerable data is changed in the table. When there is new table created in database, no statistics were created. If Auto Create Statistics is enabled, when the object is accessed first time the relevant statistics is created on the table. SQL Server by default creates the statistics for the columns used as the predicates of the statistics.

After a while when there are plenty of INSERT, UPDATE or DELETE on the table, the statistics on the table will have expired and they need to be updated again. If the option Auto Update Statistics is enabled, it will auto update the statistics at frequent intervals. This interval is not based on the time but based on data updated in the table.

By default the UPDAE of the table is synchronous, which means as soon as the table meets the needs to upgrade the statistics it is marked to update statistics for the next time SQL Server needs these statistics. As soon as SQL Server Query Optimizer needs to use statistics, it will start updating the statistics right away and when it is updated the query uses the same updated statistics to execute the query. This may result in the query running a lot slower for the first time. Sometimes when large amount of data has been updated on a table, this update process creates lots of resource usages. We can change the process to update synchronously by setting Auto Upgrade Statistics Asynchronously to TRUE.  If Auto Upgrade Statistics Asynchronously is set to TRUE, SQL Server will optimize that plan with the old statistic and schedules the update of the statistics asynchronously.

Let us see quick example where for a very small sized table how statistics makes a big difference in the performance by rendering a proper execution plan.
In this example we will firstly create two different databases and the settings for both the database are shown below:

Database 1: StatusON
Auto Create Statistics: True
Auto Update Statistics: True

Database 2: StatusOff
Auto Create Statistics: False
Auto Update Statistics: False

We will create the same structured table in both of the databases. Once the table is created we will populate the table with identical data in both of the table. Now we will run the same SELECT statement in both of the database. We will observe the execution plan (you can enable them using CTRL+M).

Listing4.png 

 

Let us observe the IO statistics and it is very clear that in second case when Statistics updates are turned off, in order to retrieve the data, SQL Server reads more pages.

 

Figure4.png 

 

If we observe the execution plan as well, it is very clear that in second case when Statistics updates are turned off, it is much more expensive than the first example. 

 

Figure5.png 

 

Based on the result above, it is very clear that a simple query can benefit tremendously from enabling statistics. Notice that the counters for IO, when Statistics are turned off require more pages to be read when data is being retrieved.
We will end this article here however, we will continue discussing the following topics in the future editions of the articles.

  1. • Analysis of DBCC SHOW_STATISTICS
    • Sampled Statistics and Non Sampled Statistics
    • Filtered Statistics
    • Updated Frequency of Statistics and Update Logic
    • Optimal Method to Update Statistics
    • Best Practices of Statistics
  • Additional References: White Paper - Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 (Link: http://msdn.microsoft.com/en-us/library/dd535534%28SQL.100%29.aspx
  •  
  • About the Author
  •  

  • Pinal Dave works as a Technology Evangelist (Database and BI) with Microsoft India. He is a SQL Server MVP and was also awarded the Community Impact Award – Individual Contributor. Pinal has worked on many performance tuning and
    optimization projects for high transactional systems. He has been a regular speaker at many international events like TechEd, SQL PASS, MSDN, TechNet and countless user groups. Pinal frequently writes on his blog http://blog.sqlauthority.com on various subjects regarding SQL Server technology and Business Intelligence.  
Follow us on: