Troubleshooting CPU Bottlenecks

CPU bottleneck can be caused due to insufficient hardware resource for the load. But before you rush to buy more hardware consider other factors. Most commonly excessive CPU utilization can be reduced by query tuning, application design changes and optimizing system configuration.

Identify the largest consumer of CPU and see if they can be tuned. Performance Monitor is generally one of the easiest means to determine whether the server is CPU bound. You should look to see whether the Processor:% Processor Time counter is high; sustained values in excess of 80% of the processor time per CPU are generally deemed to be a bottleneck. Within SQL Server, you can also check for CPU bottlenecks by checking the DMVs. Requests waiting with the SOS_SCHEDULER_YIELD wait type or a high number of runnable tasks can indicate that runnable threads are waiting to be scheduled and that there might be a CPU bottleneck on the processor. The below query gives currently cached batches or procedures that are using the most CPU. The query aggregates CPU consumed by statements having same plan_handle.

If you have enabled the data collector, the SQL Server Waits chart on the Server Activity report is a very easy way to monitor for CPU bottlenecks over time. Consumed CPU and SOS_SCHEDULER_YIELD waits are rolled up into the CPU Wait Category in this report, and if you do see high CPU utilization, you can drill through to find the queries that are consuming the most resources.

select top 50
    sum(qs.total_worker_time) as total_cpu_time,
    sum(qs.execution_count) as total_execution_count,
    count(*) as  number_of_statements,
    qs.plan_handle
from
    sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc

Common CPU resource intensive operations are discussed below. Detection and Resolution of the issues are discussed for each.

  1. Excessive query compilation and optimization.
  2. Unnecessary Recompilation.
  3. Inefficient query plan.
  4. Intraquery Parallelism.
  5. Poor cursor usage.

Check this for more troubleshooting performance issues in Sql server

Wait Statistics: the Basis for Troubleshooting

One of the first items that needs to be checked when troubleshooting performance problems on a SQL Server, is the wait statistics.

We can capture this information from a DMV sys.dm_os_wait_stats. DMV’s are cleared after a restart or manually. Ideally you would need 2 weeks data to analyze.

As a part of the normal operations of SQL Server, a number of wait conditions exist which are non-problematic in nature and generally expected on the server. These wait conditions can generally be queried from the sys.dm_os_waiting_tasks DMV for the system sessions, as shown below

SELECT DISTINCT
wt.wait_type
FROM sys.dm_os_waiting_tasks AS wt
JOIN sys.dm_exec_sessions AS s ON wt.session_id = s.session_id
WHERE s.is_user_process = 0

When looking at the wait statistics being tracked by SQL Server, it’s important that these wait types are eliminated from the analysis, allowing the more problematic waits in the system to be identified.

Below is a handy query that I use, which filters out the non problematic wait stats and points us to where the problem lies.

SELECT TOP 10
wait_type ,
max_wait_time_ms wait_time_ms ,
signal_wait_time_ms ,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
AS percent_total_waits ,
100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
AS percent_total_signal_waits ,
100.0 * ( wait_time_ms - signal_wait_time_ms )
/ SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM sys.dm_os_wait_stats
WHERE wait_time_ms > 0 -- remove zero wait_time
AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DESC

More on Wait Statistics

An excellent article on How to Tune Microsoft SQL Server for Performance