This article is Part 1 of the “Tuning SQL Azure database” series. The scope of this article is to explain how to improve the performance of SQL Azure database by inspecting the execution plan and statistics. The next article will explain how to improve SQL Azure database performance by using information from dynamic management views.
Inspecting execution plan and statistics
A unique thing about SQL Azure’s multi-tenant architecture is the way it tackles the queries that tend to throttle the SQL Azure’s (shared) server. Therefore, a longer running query is more likely to get prematurely terminated. The rule of thumb is to keep queries as short as possible. To do so, you need to tune the query so that it runs for shorter duration of time. Inspecting the execution plan and statistics of a particular query is one way to do so. Also with SQL Azure database, you need to factor in performance delay due to network latency and bandwidth for an application that is not near to the database. If it’s an application hosted on the Azure platform that connects to SQL Azure database, the best practice is to keep SQL Azure database and the Azure application in the same data center. Apart from performance gain, doing so will help you save on data transfer costs. With SQL Azure database, a data transfer that takes place in the same data center is not charged.
Traditionally, tuning a database involved some or all of the following steps:
Designing databases properly
Analyzing and optimizing queries
Changing hardware configuration or SQL Server configuration to minimize bottlenecks
Altering system software parameters for efficient resource allocation
Because SQL Azure does not provide control over underlying VM’s, system software or hardware; to tune a SQL Azure database; we must focus on tuning the application, database design, and query optimization. In fact, considerable performance gains are obtained by employing correct development techniques and using efficient data access code as shown in Figure 1: performance gain vs. methodology employed to improve performance:
Since we know that tuning a database provides considerable performance gains, in this article, we will discuss how to improve SQL Azure database performance by inspecting the information available in the execution plan and statistics.
One of the best ways to analyze the performance of a particular query is to inspect the query’s execution plan. SQL Azure does not currently support SQL Profiler and Index Tuning Wizard; however, we can view query’s execution plan and tune it based on the information available through the execution plan. In addition to execution plans, it is possible to inspect CPU time and I/O information of a query execution using STATISTICS TIME and STATISTICS IO, which we will discuss later in the article.
If you are already familiar with using the execution plan and statistics for query optimization with SQL Server, then you will see that it’s not different with SQL Azure.
I use SQL Server Management Studio 2008 R2 to connect to SQL Azure server and inspect the execution plan and statistics.
Let’s explore the execution plan first. By default, a query’s execution plan is not displayed. To view a query’s execution plan, we need to explicitly include it before executing the query. Right-Click the query window and select Include Actual Execution Plan, as Figure 2 shows.
Alternatively, you can select the option to include the actual execution plan by clicking the icon in the SQL Editor Taskbar, as Figure 3 shows:
For the purpose of this article, we will view the execution plan in graphical format, but you can also view it in text or XML format. In addition, you can view estimated execution plans for a query. Viewing estimated execution plans can be beneficial with SQL Azure because the data is not returned, so we save on the data transfer cost. Note that the query is not executed for the estimated plan, whereas the query is executed for the actual execution and the plan displayed is the output from the engine showing the plan used. Analyzing estimated execution plans instead of actual plans can be handy in a lot of scenarios because you save the time to execute the query. However, there are cases in which inspecting the actual execution plan is preferred—for example, when cardinality estimates are not accurate or when the code uses either dynamic T-SQL code or temporary tables. The other approach is to tune the queries for the on-premises version of the database and later migrate the database to SQL Azure. This approach can also save on the data transfer cost. In addition, after the database is migrated, the actual performance can be inspected using dynamic management views, which we will explore in Part 2 of this article.
For the demonstrations in this article, I used the AdventureWorksLT sample database, which is the official sample database for SQL Azure. You can download this database from the Microsoft SQL Azure download page, at http://msftdbprodsamples.codeplex.com/releases/view/37304.
Let’s run a query:
Figure 4 shows the execution plan for this query.
Note a couple of things in the execution plan:
Clustered Index Scan means that server scanned all records to answer the query
Suggestion to create an index, including specifications
A table scan is required when we are displaying every record, but when we have filtered our query using the where clause and only a fraction of rows are returned, we can create an index to improve performance.
One thing that should be reviewed in creating an index for SQL Azure database is the space occupied by the index. You might not mind a single index occupying few kilobytes of space. But what if your database has a lot of indexes? This will add to the database size. And with SQL Azure, we are charged based on the size of the database. So it becomes vital to keep a tab on the space that indexes occupy. Also, creating an index will impact the performance of DML queries (which I discuss later in this part of the article)
Following is the T-SQL code to review the size of indexes in a table for SQL Azure database:
Note that SQL Azure does not support tables without a clustered index. Every Table should have exactly one clustered index.
Now, Let’s create the index based on the index suggestion and inspect the performance gain, if any.
Note the “include” clause with the index. By using an index with included columns, we can cover more queries. Later in the article, I demonstrate this fact.
Now run the query again:
Figure 5 shows the execution plan:
Note that the execution plan now shows “Index Seek” which means that particular rows are found using a seek operation on the index. This improves performance because not all rows are supposed to be scanned to find the relevant information. Now, let’s run three separate queries to see if the index impacts other queries as well.
Figure 6 shows the execution plan for each of the three queries:
Recall that Index with included columns can cover more than one query. We can see In Figure 6 that a non-clustered index can improve performance for multiple queries. For more information about indexes with included columns, see the Microsoft article “Index with Included Columns”. You can drop the non-clustered index named nonclustered_index_cust_phone created earlier by using the following code:
Before creating indexes, we also have to factor in the performance cost that will be incurred from creating, updating, and deleting queries on the table. Figure 7 shows the results before index creation; Figure 8 shows the results after index creation.
As you can see in Figure 7, I inserted a batch of 100 rows in the “before index creation” scenario. The elapsed time is 53 seconds.
In the “after index creation” scenario, I create a non-clustered index with included columns. I then insert the batch of 100 rows again, with other parameters constant. The elapsed time is 55 seconds.
Thus, you need to keep the following considerations in mind with index creation:
Performance cost for create, update, and delete statements after index creation
Size of the index vs. performance gain; because we are charged for the size of the database, it is important to analyze the performance gain vs. the additional cost due to the increase in size of the database with index creation
Indexing is an art as well as science, and it is important to balance the performance gain vs. cost for a particular scenario. Here, in this article, my purpose was to point out the fact that Execution plan is available in SQL Azure and the Information available through the execution plan is useful in tuning SQL Azure database.
Statistics are a handy tool for tuning queries. Information such as CPU time and I/O information for query execution is accessible through STATISTICS IO and STATISTICS TIME. Based on this information available from it, we can see whether or not query was tuned.
By default, the STATISTICS IO and STATISTICS TIME are OFF. The following T-SQL code turns them ON:
You can see the statistics in the ‘Message’ tab of the output window. Now, let’s try to tune a query and see how the information available from STATISTICS IO and STATISTICS TIME can be used during the performance tuning process.
If you inspect the message tab, you will see the following output:
“CPU time” shows the amount of CPU time it took to execute a query; and “elapsed time” shows the total time it took to run the query. Because of varying loads on the server, the elapsed time will show varying times, but the CPU time is invariably consistent. Because CPU time is consistent, this parameter can help in determining whether the changes made in the query, during performance tuning, are acually helping (or hurting). Because it a simple query, we encounter Zero’s.
When I ran “select * from saleslt.product,” I encountered the following output:
As discussed earlier, the ‘elapsed time’ will show inconsistent results.
The information available from STATISTICS, that interest me is the ‘logical reads’. Logical reads are the number of pages server had to read from the data cache in order to produce the results of the specified query. In addtion, logical reads remain constant from one execution of a query to an other execution of the same query. If this number goes down when you are tunign a query, it is an indication that the query is using less resources and that its performace will improve.
Now, let’s create an index:
Then, lets run the query again and inspect the Messages tab:
Note that after index creation, the logical reads decreased from 103 to 2. As discussed earlier, this is an indication that the server is using less resources and therefore, perofrmance will improve. Based on the information avaialable from STATISTICS IO and STATISTICS TIME, we verified that our query is getting tuned.
Thus, Information available from execution plan and statistics can be used to tune SQL Azure database. In Part 2 f this article, we will see how to tune SQL Azure database using dynamic management views.