Excessive Query Compilation and Optimization – CPU Bottleneck

Let’s say you are experiencing High CPU load or longer transaction execution time. And you are seeing excessive query compilations. To troubleshoot we need to find which queries are causing the trouble.

Query compilation and optimization is a CPU-intensive process. The cost of optimization increases as the complexity of the query and the underlying schema increases, but even a relatively simply query can take 10-20 milliseconds of CPU time to parse and compile. To mitigate this cost, SQL Server caches and reuses compiled query plans.. Each time a new query is received from the client, SQL Server first searches the plan cache (sometimes referred to as the procedure cache) to see whether there is already a compiled plan that can be reused. If a matching query plan cannot be found, SQL Server parses and compiles the incoming query before running it.

Using parameterized queries or stored procedures for OLTP-based applications substantially increases the chance of reusing a cached plan and can result in substantial reductions in SQL Server CPU consumption. You can enable parameterization at the database or query level by using the PARAMETERIZATION FORCED database option or query hint, respectively. However, the best place to parameterize queries is within the application itself (at design time), which also helps mitigate the risk of SQL injection by avoiding string concatenation using parameter values.

DETECTION

During compilation, SQL Server 2008 computes a “signature” of the query and exposes this as the query_hash column in sys.dm_exec_requests and sys.dm_exec_query_stats, and the QueryHash attribute in Showplan/Statistics XML. Entities with the same query_hash value have a high probability of referring to the same query text if it had been written in a query_hash parameterized form. Queries that vary only in literal values should have the same value. For example, the first two queries share the same query hash, while the third query has a different query hash, because it is performing a different operation.

select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'

The query hash is computed from the tree structure produced during compilation. Whitespace is ignored, as are differences in the use of explicit column lists compared to using an asterisk (*) in the SELECT list. Furthermore, it does not matter if one query uses fully qualified name and another uses just the table name as long as they both refer to the same object. All of the following should produce the same query_hash value.

select * from Employee e
select * from Sales.Employee e
select * from employeeid, name, designation from Sales.Employee e

You can check this by turning on the showplan_xml

set showplan_xml on
go

An easy way to detect applications that submit lots of ad hoc queries is by grouping on the sys.dm_exec_query_stats.query_hash column as follows.

select q.query_hash, 
 q.number_of_entries, 
 t.text as sample_query, 
 p.query_plan as sample_plan
from (select top 20 query_hash, 
 count(*) as number_of_entries, 
 min(sql_handle) as sample_sql_handle, 
 min(plan_handle) as sample_plan_handle
 from sys.dm_exec_query_stats
 group by query_hash
 having count(*) > 1
 order by count(*) desc) as q
 cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
 cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

Queries that have a number_of_entries value in the hundreds or thousands are excellent candidates for parameterization. If you look at the CompileTime and CompileCPU attributes under the <QueryPlan> tag of the sample XML query plan and multiply those values times the number_of_entries value for that query, you can get an estimate of how much compile time and CPU you can eliminate by parameterizing the query (which means that the query is compiled once, and then it is cached and reused for subsequent executions). Eliminating these unnecessary cached plans has other intangible benefits as well, such as freeing memory to cache other compiled plans (thereby further reducing compilation overhead) and leaving more memory for the buffer cache.

RESOLUTION

Use the query_hash and query_plan_hash values together to determine whether a set of ad hoc queries with the same query_hash value resulted in query plans with the same or different query_plan_hash values, or access path. This is done via a small modification to the earlier query.

select q.query_hash, 
 q.number_of_entries, 
 q.distinct_plans,
 t.text as sample_query, 
 p.query_plan as sample_plan
from (select top 20 query_hash, 
 count(*) as number_of_entries, 
 count(distinct query_plan_hash) as distinct_plans,
 min(sql_handle) as sample_sql_handle, 
 min(plan_handle) as sample_plan_handle
 from sys.dm_exec_query_stats
 group by query_hash
 having count(*) > 1
 order by count(*) desc) as q
 cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
 cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

Note that this new query returns a count of the number of distinct query plans (query_plan_hash values) for a given query_hash value. Rows that return a large number for number_of_entries and a distinct_plans count of 1 are good candidates for parameterization. Even if the number of distinct plans is more than one, you can use sys.dm_exec_query_plan to retrieve the different query plans and examine them to see whether the difference is important and necessary for achieving optimal performance.

After you determine which queries should be parameterized, the best place to parameterize them is the client application. The details of how you do this vary slightly depending on which client API you use, but the one consistent thing across all of the APIs is that instead of building the query string with literal predicates, you build a string with a question mark (?) as a parameter marker.

-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?

The client driver or provider then submits the query in its parameterized form using sp_executesql.

exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100

Because the query is parameterized, it matches and reuses an existing cached plan.

More on Troubleshooting CPU Bottlenecks

 

Leave a comment