Workload analysis using query statistics

Earlier posts described how you can collect and analyse waits to identify resource limitations and constraints for your SQL Servers. This post outlines a similar approach for identifying workload impact.

In many cases we use server side traces to answer questions such as ‘what is running against my SQL Server ?’. Such questions normally arise as a result of performance issues affecting a front end application. When used with event and column filters SQL traces are very good at identifying specific troublesome queries. Extended events, as an alternative to traces, also meet this requirement. But to answer a very high level question such as ‘has my workload on this server increased or decreased ?’ using traces you’d have to capture practically all queries and analyse the results using tools such as SQL Nexus and RML utilities.

A much easier way to answer overall workload questions is to collect and analyse the query statistics that SQL Server automatically saves for you. They can be found in the sys.dm_exec_query_stats table. By sampling this table at regular intervals you can plot trends for execution counts, worker times, durations, reads, writes, row counts etc. The statistics are available at the statement level granularity. This means you have performance data for ad hoc statements, prepared statements and the individual statements for stored procedures, functions and triggers. By saving historic figures at this level of granularity you can answer questions not just at the overall workload level but also at the individual statement level. For example, a question such as ‘has the performance of a specific statement in a stored procedure degraded over time ?’

There are caveats to using sys.dm_exec_query_stats. Because the statistics are associated with cached plans certains executions will not have their statistics stored. For example, last_execution_time, execution_count etc won’t be updated if you execute a stored procedure WITH RECOMPILE. Certain commands will not have query statistics entries – BEGIN TRAN, ROLLBACK TRAN, DBCC, BACKUP etc. The only way to deal with these missing entries is to capture the entire workload using traces. The downside is the potentially tens of gigabytes worth of trace files that can be generated for busy systems. So, using query statistics for the analysis we trade off some of the commands we may miss for the saving in disk space and overhead of processing a potentially large number of trace files.

Gauging the impact of missing commands is fairly straightforward. You compare a trace analysis with a query statistics analysis. If total reads, total duration, total cpu from a trace analysis match the corresponding totals from a query statistics analysis then for the given workload you’ve validated the query statistics approach.

Having set the background we’ll move on to describe one approach to collecting and analysing data from sys.dm_exec_query_stats.

In SQL 2008 and above we have the query_hash column. Grouping on the query_hash, statement start offset and statement end offset we can aggregate the values for execution count, total work time, last reads etc to measure the workload hitting your SQL Server. To minimise the impact of statistics collection the approach listed here does not fetch the actual query text. This can easily be obtained once the overall workload picture is presented.

Just as for the collection of wait, latch and spinlock statistics (see early post), the approach here is to collect the query statistics from target servers and store the data in a warehouse on another server. Compared to the earlier post the only dimension table that changes is the dimServer table. We add a column to store the date and time that query statistics were last fetched. We avoid resampling the whole data by making use of this column. Here is the modified dimServer table:

-- Server table
IF OBJECT_ID('dbo.dimServer','U') IS NOT NULL DROP TABLE dbo.dimServer;
GO

CREATE TABLE dbo.dimServer
(
ServerID int NOT NULL IDENTITY(1,1),
ServerName sysname NOT NULL,
QueryStatsLastFetch datetime NOT NULL CONSTRAINT df_dimServer_QueryStatsLastFetch DEFAULT ('19900101')
CONSTRAINT pk_dimServer PRIMARY KEY CLUSTERED (ServerID),
CONSTRAINT uq_dimServer UNIQUE NONCLUSTERED (ServerName)
);

-- Pre-populate with your list of servers
INSERT INTO dbo.dimServer (ServerName)
SELECT CAST(SERVERPROPERTY('ServerName') AS sysname);

The procedure to get and set the QueryStatsLastFetch column is here:

IF OBJECT_ID('dbo.procServerQueryStatsLastFetch','P') IS NOT NULL DROP PROCEDURE dbo.procServerQueryStatsLastFetch;
GO

CREATE PROCEDURE [dbo].[procServerQueryStatsLastFetch]
@GetSet char(3),
@ServerName sysname,
@QueryStatsLastFetch datetime OUTPUT
AS
BEGIN

SET NOCOUNT ON

IF UPPER(@GetSet) = 'SET'
BEGIN
UPDATE dbo.dimServer SET QueryStatsLastFetch = @QueryStatsLastFetch WHERE ServerName = @ServerName
END

IF UPPER(@GetSet) = 'GET'
BEGIN
SELECT @QueryStatsLastFetch = QueryStatsLastFetch FROM dbo.dimServer WHERE ServerName = @ServerName
END

RETURN
END;

The query_hash, statement_start_offset and statement_end_offset are stored in another dimension table and are referenced through the QueryID column:

IF OBJECT_ID('dbo.dimQuery','U') IS NOT NULL DROP TABLE dbo.dimQuery;
GO

CREATE TABLE dbo.dimQuery
(
QueryID int IDENTITY(1,1) NOT NULL,
ServerID int NOT NULL,
query_hash binary(8) NOT NULL,
statement_start_offset int NOT NULL,
statement_end_offset int NOT NULL,
CONSTRAINT pk_dimQuery PRIMARY KEY CLUSTERED (QueryID),
CONSTRAINT uq_dimQuery UNIQUE NONCLUSTERED (ServerID,query_hash,statement_start_offset,statement_end_offset)
);

The query statistics are stored in this fact table:

IF OBJECT_ID('dbo.factQueryStats','U') IS NOT NULL DROP TABLE dbo.factQueryStats;
GO

CREATE TABLE dbo.factQueryStats(
ServerID int NOT NULL,
DateID int NOT NULL,
TimeID int NOT NULL,
QueryID int NOT NULL,
query_hash_count int NULL,
plan_generation_num bigint NULL,
creation_time datetime NULL,
last_execution_time datetime NULL,
execution_count bigint NULL,
total_worker_time bigint NULL,
last_worker_time bigint NULL,
min_worker_time bigint NULL,
max_worker_time bigint NULL,
total_physical_reads bigint NULL,
last_physical_reads bigint NULL,
min_physical_reads bigint NULL,
max_physical_reads bigint NULL,
total_logical_writes bigint NULL,
last_logical_writes bigint NULL,
min_logical_writes bigint NULL,
max_logical_writes bigint NULL,
total_logical_reads bigint NULL,
last_logical_reads bigint NULL,
min_logical_reads bigint NULL,
max_logical_reads bigint NULL,
total_clr_time bigint NULL,
last_clr_time bigint NULL,
min_clr_time bigint NULL,
max_clr_time bigint NULL,
total_elapsed_time bigint NULL,
last_elapsed_time bigint NULL,
min_elapsed_time bigint NULL,
max_elapsed_time bigint NULL,
total_rows bigint NULL,
last_rows bigint NULL,
min_rows bigint NULL,
max_rows bigint NULL,
CONSTRAINT pk_factQueryStats PRIMARY KEY CLUSTERED (ServerID,QueryID,DateID,TimeID)
);

To complete the setup we have the table type and stored procedure that inserts into the fact table:

-- Drop the stored procedure that use the table valued parameter
IF OBJECT_ID('dbo.procInsertQueryStats','P') IS NOT NULL DROP PROCEDURE dbo.procInsertQueryStats;
GO

-- Table type for table valued parameter
IF EXISTS (SELECT * FROM sys.types WHERE name = 'typeQueryStats') DROP TYPE dbo.typeQueryStats;
GO

CREATE TYPE dbo.typeQueryStats AS TABLE
(
ServerName sysname NOT NULL,
SampleDateTime datetime NOT NULL,
query_hash binary(8) NOT NULL,
statement_start_offset int NOT NULL,
statement_end_offset int NOT NULL,
query_hash_count int NULL,
plan_generation_num bigint NULL,
creation_time datetime NULL,
last_execution_time datetime NULL,
execution_count bigint NULL,
total_worker_time bigint NULL,
last_worker_time bigint NULL,
min_worker_time bigint NULL,
max_worker_time bigint NULL,
total_physical_reads bigint NULL,
last_physical_reads bigint NULL,
min_physical_reads bigint NULL,
max_physical_reads bigint NULL,
total_logical_writes bigint NULL,
last_logical_writes bigint NULL,
min_logical_writes bigint NULL,
max_logical_writes bigint NULL,
total_logical_reads bigint NULL,
last_logical_reads bigint NULL,
min_logical_reads bigint NULL,
max_logical_reads bigint NULL,
total_clr_time bigint NULL,
last_clr_time bigint NULL,
min_clr_time bigint NULL,
max_clr_time bigint NULL,
total_elapsed_time bigint NULL,
last_elapsed_time bigint NULL,
min_elapsed_time bigint NULL,
max_elapsed_time bigint NULL,
total_rows bigint NULL,
last_rows bigint NULL,
min_rows bigint NULL,
max_rows bigint NULL
);
GO

CREATE PROCEDURE dbo.procInsertQueryStats
@NearestSeconds int,
@tblStats typeQueryStats READONLY
AS
BEGIN

SET NOCOUNT ON;

DECLARE @ServerID int;
DECLARE @ServerName sysname;
DECLARE @SampleDateTime datetime;
DECLARE @QueryStatsLastFetch datetime;
DECLARE @DateID int;
DECLARE @TimeID int;

-- Summary dimension data from stats table
SELECT TOP 1
@ServerID = ds.ServerID,
@ServerName = ds.ServerName,
@SampleDateTime = s.SampleDateTime,
@QueryStatsLastFetch = s.last_execution_time
FROM @tblStats s
INNER JOIN dbo.dimServer ds
ON ds.ServerName = s.ServerName
ORDER BY last_execution_time DESC;

-- Normalise the sample time - round off to nearest seconds
SET @SampleDateTime = DATEADD(second,@NearestSeconds*(DATEPART(second,@SampleDateTime)/@NearestSeconds),CONVERT(varchar(40),@SampleDateTime,100));

-- Date and time IDs
SELECT @DateID = DateID
FROM dbo.dimDate
WHERE DateValue = CAST(@SampleDateTime AS date);

SELECT @TimeID = TimeID
FROM dbo.dimTime
WHERE TimeValue = CAST(@SampleDateTime AS time);

-- Insert queries that are new since last collection
INSERT INTO dimQuery(ServerID,query_hash,statement_start_offset,statement_end_offset)
SELECT @ServerID,query_hash,statement_start_offset,statement_end_offset
FROM @tblStats s
WHERE NOT EXISTS (
SELECT * FROM dimQuery dq
WHERE dq.ServerID = @ServerID
AND dq.query_hash = s.query_hash
AND dq.statement_start_offset = s.statement_start_offset
AND dq.statement_end_offset = s.statement_end_offset
);

INSERT INTO dbo.factQueryStats
SELECT
@ServerID,
@DateID,
@TimeID,
dq.QueryID,
s.query_hash_count,
s.plan_generation_num,
s.creation_time,
s.last_execution_time,
s.execution_count,
s.total_worker_time,
s.last_worker_time,
s.min_worker_time,
s.max_worker_time,
s.total_physical_reads,
s.last_physical_reads,
s.min_physical_reads,
s.max_physical_reads,
s.total_logical_writes,
s.last_logical_writes,
s.min_logical_writes,
s.max_logical_writes,
s.total_logical_reads,
s.last_logical_reads,
s.min_logical_reads,
s.max_logical_reads,
s.total_clr_time,
s.last_clr_time,
s.min_clr_time,
s.max_clr_time,
s.total_elapsed_time,
s.last_elapsed_time,
s.min_elapsed_time,
s.max_elapsed_time,
s.total_rows,
s.last_rows,
s.min_rows,
s.max_rows
FROM @tblStats s
INNER JOIN dbo.dimQuery dq
ON dq.ServerID = @ServerID
AND dq.query_hash = s.query_hash
AND dq.statement_start_offset = s.statement_start_offset
AND dq.statement_end_offset = s.statement_end_offset

-- Update the last collection time
EXEC dbo.procServerQueryStatsLastFetch 'SET',@ServerName,@QueryStatsLastFetch OUTPUT

END;
GO

Similarly to the case for wait, latch and spinlock statistics, there are many ways by which we could populate the fact table. A Powershell job step using the following script is one possibility:

# Query stats

# Get last collection datetime from data warehouse server
$targetserver = "DWHSRV"
$dbname = "DBAdwh"
$procname = "dbo.procServerQueryStatsLastFetch"
$getset = "GET"
$servername = "TARGETSRV"
[System.DateTime]$querystatslastfetch = "01 January 1990"

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=$dbname;Integrated Security=SSPI;")
$conn.Open()
$cmd = New-Object System.Data.SqlClient.SqlCommand($procname,$conn)
$cmd.CommandType = [System.Data.CommandType]'StoredProcedure'
$cmd.Parameters.Add("@GetSet",$getset) | Out-Null
$cmd.Parameters.Add("@ServerName",$servername) | Out-Null
$cmd.Parameters.Add("@QueryStatsLastFetch",$querystatslastfetch) | Out-Null
$cmd.Parameters["@QueryStatsLastFetch"].Direction = [System.Data.ParameterDirection]::Output
$cmd.ExecuteNonQuery() | Out-Null
[System.DateTime]$querystatslastfetch = $cmd.Parameters["@QueryStatsLastFetch"].Value
$conn.Close()
$conn.Dispose()

# Fetch stats from target server
$targetserver = "TARGETSRV"
$sql = @"
EXEC sp_executesql
N'SELECT CAST(SERVERPROPERTY(''ServerName'') AS sysname) AS ServerName,GETDATE() AS SampleDateTime,

query_hash,
statement_start_offset,
statement_end_offset,
COUNT(*) AS query_hash_count,

MAX(plan_generation_num) as plan_generation_num,
MAX(creation_time) as creation_time,
MAX(last_execution_time) as last_execution_time,
SUM(execution_count) as execution_count,

SUM(total_worker_time) as total_worker_time,
MAX(last_worker_time) as last_worker_time,
MIN(min_worker_time) as min_worker_time,
MAX(max_worker_time) as max_worker_time,

SUM(total_physical_reads) as total_physical_reads,
MAX(last_physical_reads) as last_physical_reads,
MIN(min_physical_reads) as min_physical_reads,
MAX(max_physical_reads) as max_physical_reads,

SUM(total_logical_writes) as total_logical_writes,
MAX(last_logical_writes) as last_logical_writes,
MIN(min_logical_writes) as min_logical_writes,
MAX(max_logical_writes) as max_logical_writes,

SUM(total_logical_reads) as total_logical_reads,
MAX(last_logical_reads) as last_logical_reads,
MIN(min_logical_reads) as min_logical_reads,
MAX(max_logical_reads) as max_logical_reads,

SUM(total_clr_time) as total_clr_time,
MAX(last_clr_time) as last_clr_time,
MIN(min_clr_time) as min_clr_time,
MAX(max_clr_time) as max_clr_time,

SUM(total_elapsed_time) as total_elapsed_time,
MAX(last_elapsed_time) as last_elapsed_time,
MIN(min_elapsed_time) as min_elapsed_time,
MAX(max_elapsed_time) as max_elapsed_time,

0,
0,
0,
0

FROM sys.dm_exec_query_stats
WHERE last_execution_time > @querystatslastfetch

GROUP BY
query_hash,
statement_start_offset,
statement_end_offset',
N'@querystatslastfetch datetime',
'$querystatslastfetch'
"@

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=master;Integrated Security=SSPI;")
$adapter = New-Object System.Data.SqlClient.SqlDataAdapter($sql,$conn)
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataset) | Out-Null
$conn.Dispose()

# Save results to data warehouse
$targetserver = "DWHSRV"
$dbname = "DBAdwh"
$procname = "dbo.procInsertQueryStats"
$nearestseconds = 60

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=$dbname;Integrated Security=SSPI;")
$conn.Open()
$cmd = New-Object System.Data.SqlClient.SqlCommand($procname,$conn)
$cmd.CommandType = [System.Data.CommandType]'StoredProcedure'
$cmd.Parameters.Add("@NearestSeconds",$nearestseconds) | Out-Null
$cmd.Parameters.Add("@tblStats",$dataset.tables[0]) | Out-Null
$cmd.ExecuteNonQuery() | Out-Null
$conn.Close()
$conn.Dispose()

Notice in the sys.dm_exec_query_stats prepared statement I’m setting the total_rows, last_rows, min_rows,max_rows columns to zero. These columns are only available in SQL 2008 R2 and higher. The code can be changed to dynamically include or exclude these columns based on the target server being examined.

For a busy system a potentially large number of rows will be returned for each collection. It is important to put in place a purge routine (scheduled job) for the removal of historic data. The retention period can be tailored for each target server by mapping this attribute to a new column in the dimServer table. The following script takes a simpler, ‘one retention period for all servers’ approach:


SET NOCOUNT ON

DECLARE @RetainDays int
SET @RetainDays = 7

DECLARE @MinDateID int

SET @MinDateID = CONVERT(varchar(8),DATEADD(day,-1*@RetainDays,GETDATE()),112)

SELECT 'Remove statistics less than or equal to ',@MinDateID

SELECT 'Deleting historic QueryStats...'

WHILE @@ROWCOUNT <> 0
BEGIN
DELETE TOP (1000)
FROM dbo.factQueryStats
WHERE DateID <= @MinDateID
END

SELECT 'Deleting Queries which have no QueryStats...'

WHILE @@ROWCOUNT <> 0
BEGIN
DELETE TOP (1000) q
FROM dbo.dimQuery q
WHERE NOT EXISTS (
SELECT *
FROM dbo.factQueryStats qs
WHERE qs.QueryID = q.QueryID
)
END

In terms of presentation of the collected data, Powerpivot once again is the best tool. All tables with the exception of the dimQuery are imported into Powerpivot as tables. The varbinary query_hash column in dimQuery generates a data type error if the import table option is chosen. To overcome this create the following view in the data warehouse and import the data through it:


CREATE VIEW dbo.vwdimQuery
AS
SELECT
QueryID,
ServerID,
master.sys.fn_varbintohexstr(query_hash) AS query_hash,
statement_start_offset,
statement_end_offset
FROM dbo.dimQuery

After creating the appropriate relationships between the fact and dimension ID columns the final data model should appears as this:

Below I chart the historic execution count. We can see it increasing for QueryID 14. Note, I stopped my SQL Server at about 13:25 and restarted at about 19:30. The DMV statistics are reset when SQL Server is restarted:

As mentioned early, to minimise the impact on the target server the collection process does not fetch the actual query text. Once you’ve identified a query of interest you use its QueryID to fetch the query_hash from dimQuery. You then examine the target server’s DMV to get the full query details (the text, the plan, the plan attributes etc).

If you’re only interested in statistics at the procedure level and not at the individual statement level then a similar collection and analysis based on sys.dm_exec_procedure_stats can be performed.

This simple approach to collecting and analysing query statistics will reveal interesting details about your SQL Server workloads. Regular patterns will emerge. Together with the wait, latch and spinlock analysis you should now have a complete picture of your SQL Servers.

Advertisements
  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: