Archive for April, 2012

Extended event for inaccurate cardinality estimate

An extended event for inaccurate cardinality estimate is available to those of us using SQL Server 2012. The description for the event makes clear that the event can have significant performance overhead.

It’s best to heed this warning. I’d recommend conducting any analysis on a non-production server.

One of the event data columns that appears for this event is named actual_rows. Testing this functionality for the first time I wasn’t able to make sense of the values returned. I populated my demo table with two rows, ran a stored procedure to select from the table (as normal, the plan is cached after first execution), inserted ninety eight more rows and ran the stored procedure again. The output of the extended event showed me an estimated_rows column with a value of two (which made sense) but the actual_rows wasn’t one hundred or ninety eight. Here’s the result set:

After some research it appears that the actual_rows value represents row count thresholds. This post give a full explanation:

http://blogs.msdn.com/b/ialonso/archive/2012/04/16/how-the-query-execution-engine-calculates-the-threshold-it-uses-to-detect-inaccurate-cardinality-estimation-and-fire-the-inaccurate-cardinality-estimate-extended-event.aspx

According to this post the extended event is fired by a subset of the plan operators (those using the GetRow method) when predefined row count thresholds have been exceeded. This makes sense from a performance impact point of view. The post mentions thresholds based on five times the base cardinality. To confirm this fact the following script was run and the result sets examined for varying numbers of inserts:


IF OBJECT_ID('dbo.demotable','U') IS NOT NULL DROP TABLE dbo.demotable;

CREATE TABLE dbo.demotable
(
id int NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
val char(3) NOT NULL
);

CREATE INDEX idxdemotableval ON dbo.demotable (val);

IF OBJECT_ID('dbo.fetchfromdemotable','P') IS NOT NULL DROP PROCEDURE dbo.fetchfromdemotable;
GO

CREATE PROCEDURE dbo.fetchfromdemotable @val char(3)
AS
BEGIN

DECLARE @id int

SET NOCOUNT ON

SELECT    @id = id
FROM dbo.demotable
WHERE val = @val

END;
GO

-- Insert two rows into the table
INSERT INTO dbo.demotable (val) VALUES ('abc')
GO 2

-- Run the stored procedure for the first time. It will fetch just the 2 rows
-- The query plan shows node_id 0 being the nonclustered index seek on the val column
EXEC dbo.fetchfromdemotable 'abc'
GO

-- Set up the extended event session to catch inaccurate cardinality estimate events
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'extevInaccurateCardinalityEstimate')
DROP EVENT SESSION extevInaccurateCardinalityEstimate
ON SERVER;
GO

-- Specify your database_id and session_id in the WHERE clause
CREATE EVENT SESSION extevInaccurateCardinalityEstimate
ON SERVER
ADD EVENT sqlserver.inaccurate_cardinality_estimate
(
ACTION (sqlserver.session_id,sqlserver.database_id,sqlserver.sql_text,sqlserver.plan_handle)
WHERE sqlserver.database_id = 8
AND sqlserver.session_id = 54
)
ADD TARGET package0.ring_buffer
(SET max_memory = 4096)
WITH (max_dispatch_latency = 1 seconds);
GO

ALTER EVENT SESSION extevInaccurateCardinalityEstimate
ON SERVER
STATE = START
GO

-- Insert more rows in addition to the 2 that we already have
INSERT INTO dbo.demotable (val) VALUES ('abc')
GO 7        -- one less than first threshold
--GO 8        -- at first threshold
--GO 57        -- one less than second threshold
--GO 58        -- at second threshold
--GO 357    -- one less than third threshold
--GO 358    -- at third threshold
--GO 2157    -- one less fourth than threshold
--GO 2158    -- at fourth threshold
GO

-- Run proc
EXEC dbo.fetchfromdemotable 'abc'
GO

-- estimated_rows will always be 2 as that's what we had when the procedure was first executed
SELECT
event_data.value('(@name)[1]','varchar(50)') AS event_name,
DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),CURRENT_TIMESTAMP),event_data.value('(@timestamp)[1]','datetime2')) AS [timestamp],
event_data.value('(action[@name="sql_text"]/value)[1]','nvarchar(max)') AS sql_text,
event_data.value('(data[@name="node_id"]/value)[1]','int') AS node_id,
event_data.value('(data[@name="estimated_rows"]/value)[1]','int') AS estimated_rows,
event_data.value('(data[@name="actual_rows"]/value)[1]','int') AS actual_rows,
event_data.value('(data[@name="fire_count"]/value)[1]','int') AS fire_count,
event_data.value('xs:hexBinary((action[@name="plan_handle"]/value)[1])','varbinary(64)') AS plan_handle
FROM (
SELECT CAST(target_data AS XML) target_data
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON s.address = t.event_session_address
WHERE s.name = 'extevInaccurateCardinalityEstimate'
AND t.target_name = 'ring_buffer'
) AS Data
CROSS APPLY target_data.nodes ('RingBufferTarget/event') AS p(event_data)
GO

ALTER EVENT SESSION extevInaccurateCardinalityEstimate
ON SERVER
STATE = STOP
GO

-- SELECT * FROM sys.dm_exec_query_plan (0x05000800CCB4027E10B402710100000001000000000000000000000000000000000000000000000000000000)

You run this script in a test database and change the database_id and session_id to appropriate values. The script can be run as a whole. The first run specifies a GO 7. This is for a total row count of 9 for the demo table. You won’t see a result set for the final select indicating that the event has not been fired. The row count threshold has not been reached.

If you comment out the GO 7 and remove the comments for GO 8 and rerun the entire script you’ll see a result set:

Fives times the estimated_rows gives you the threshold seen the in the actual_rows value. We added eight to the already existing two rows so the event fired.

The next threshold is for a total row count of the ten existing rows plus first times the lastest threshold (ten) ie sixty. If you comment out the GO 8 and uncomment GO 57 this will be the case for one less than the next threshold. Run the script in its entirety and you’ll see no change to the result set. We’re under the second threshold so only one event fired. Comment out GO 57 and uncomment GO 58, run the script and you’ll see the second threshold has been met:

The third threshold is the 60 rows we have plus first times the current threshold of 60 ie 360.

The fourth threshold is 360 plus five times 360 ie 2160. Carry on changing the commented out lines and you’ll see the change in the number of events fired. The result set for the final run is as follows:

So the result set now makes sense. Events are fired when row count thresholds are exceeded. The combination of the plan handle and node_id allows you to identify the plan and operator (see the commented out SELECT FROM sys.dm_exec_query_plan at the bottom of the script).

Advertisements

Leave a comment

Case of exclusive lock not blocking a share lock

Disregarding any of the higher isolation levels and using the out-of-box read committed isolation level you’d expect an exclusive lock to block a share lock. Here’s a case, admittedly not that common, where blocking does not occur.


-- Create a demo table

CREATE TABLE dbo.t (c int NOT NULL);

INSERT INTO dbo.t VALUES (1);

-- In one session run this

BEGIN TRAN;

UPDATE dbo.t SET c = c;

-- Examine the locks. You'll see the RID X lock

SELECT * FROM sys.dm_tran_locks WHERE request_session_id = @@SPID

-- In another session run this

SELECT c FROM dbo.t

-- The SELECT is not blocked

This is a contrived example. The column value is being updated to itself but this may happen, for example, in an ETL load where your transaction begins, the daily update value doesn’t change, an update trigger on the table fires that updates other tables. In the transaction time frame you rely on SQL Server’s locking mechanisms to prevent other users reading data while transactions involving it are taking place. As the example shows, for this data scenario your users won’t be blocked.

The real world cases aside, there’s also a SQL Server inconsistency. If the table definition is changed so that a clustered primary key is placed on the column then blocking does occur. You’ll see a granted exclusive key lock from the updating spid and a waiting share key lock from the selecting spid. So the behaviour is dependent on whether the table is a heap or a btree.

Realistically speaking there’s little chance of this being addressed. Nevertheless, the following CONNECT item has been raised:

https://connect.microsoft.com/SQLServer/feedback/details/739700/row-id-rid-exclusive-lock-x-not-blocking-a-select

UPDATE: Martin Smith kindly replied to the CONNECT item. He cited a couple of posts by Paul White that explain the behaviour described in this post:

http://sqlblog.com/blogs/paul_white/archive/2010/11/01/read-committed-shared-locks-and-rollbacks.aspx

http://sqlblog.com/blogs/paul_white/archive/2010/08/11/the_2D00_impact_2D00_of_2D00_update_2D00_statements_2D00_that_2D00_don_2D00_t_2D00_change_2D00_data.aspx

Using fn_dblog for the two scenarios mentioned here show no log entries for the heap case compared to a delete and an insert operation for the btree case. This explains the blocking for the btree case.

Leave a comment

Using Powershell to read trace files

We know that trace files can be opened in Profiler and manually examined. They can also be imported into SQL Server database tables and queried using fn_trace_gettable. The problem with these approaches is that they are time consuming and, for fn_trace_gettable, require database storage before queries can be run.

An alternative is to use Powershell. Credit for this approach goes to the authors of this post:

http://www.solidq.com/sqj/Documents/2010_December_Issue/sqj_006_pag._33-38.pdf

As mentioned in this post, the trace objects are not supported on 64 bit platforms. To avoid run time errors you need to run the commands or script in a Powershell x86 command line shell.

Here is an example of using Powershell to count the number of occurrences of a SQL statement within a set of trace files. The count is returned in terms of number of occurrences per minute. A possible use-case would be where you have a very frequently running SQL statement and you wish to trend its execution:


# Credit: http://www.solidq.com/sqj/Documents/2010_December_Issue/sqj_006_pag._33-38.pdf

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.ConnectionInfoExtended") | Out-Null;

$InputTraceDirectory = "D:\traces-in";
$TraceFilePattern = "*.trc";

# Hash to store occurrences per minute
$Occurrences = @{};

Get-ChildItem $InputTraceDirectory -Filter $TraceFilePattern | ForEach-Object {

$InputFilename = $_.FullName;

Write-Host ("Processing file " + $InputFilename);

$InputTraceFile = New-Object "Microsoft.SqlServer.Management.Trace.Tracefile";
$InputTraceFile.InitializeAsReader($InputFilename);

While ($InputTraceFile.Read()) {

If ($InputTraceFile.GetValue($InputTraceFile.GetOrdinal("EventClass")) -eq "SQL:BatchCompleted") {

If ($InputTraceFile.GetValue($InputTraceFile.GetOrdinal("TextData")) -match "select columns from dbo.mytable") {

$EndTime = $InputTraceFile.GetValue($InputTraceFile.GetOrdinal("EndTime"));

# Date doesn't matter. Start time set to 00:00 as we're measuring minutes since midnight
$TimeDiff = New-TimeSpan -start "1 April 2012 00:00" -end $EndTime;

# The hash key is the minute value. The hash value is the count.
$Occurrences[($TimeDiff.Hours)*60 + $TimeDiff.Minutes]++;
};
};
};

$InputTraceFile.Close();
};

$Occurrences.GetEnumerator() | Sort-Object Key | Select-Object @{Label="Time";Expression={([System.DateTime]"1 April 2012 00:00").AddMinutes($_.Key)}},Value | ConvertTo-CSV;

As commented in the source code, the date part is irrelevant. This would be appropriate where you are processing, for example, an afternoon’s worth of trace files. The CSV output can be saved as a file, opened in Excel and charted.

Leave a comment

Incorrect query worker times for SQL Server 2005

The worker time columns in sys.dm_exec_query_stats will report incorrect values for queries that use parallelism. By comparison SQL traces will report correct values in the CPU column. Microsoft do not provide a fix for this for SQL 2005. See this Connect post:

http://connect.microsoft.com/SQLServer/feedback/details/487838/cpu-time-sys-dm-exec-query-stats-vs-set-statistics-time-on

The work time column values are correct for non-parallel queries.

Leave a comment

Workload analysis using query statistics

Earlier posts described how you can collect and analyse waits to identify resource limitations and constraints for your SQL Servers. This post outlines a similar approach for identifying workload impact.

In many cases we use server side traces to answer questions such as ‘what is running against my SQL Server ?’. Such questions normally arise as a result of performance issues affecting a front end application. When used with event and column filters SQL traces are very good at identifying specific troublesome queries. Extended events, as an alternative to traces, also meet this requirement. But to answer a very high level question such as ‘has my workload on this server increased or decreased ?’ using traces you’d have to capture practically all queries and analyse the results using tools such as SQL Nexus and RML utilities.

A much easier way to answer overall workload questions is to collect and analyse the query statistics that SQL Server automatically saves for you. They can be found in the sys.dm_exec_query_stats table. By sampling this table at regular intervals you can plot trends for execution counts, worker times, durations, reads, writes, row counts etc. The statistics are available at the statement level granularity. This means you have performance data for ad hoc statements, prepared statements and the individual statements for stored procedures, functions and triggers. By saving historic figures at this level of granularity you can answer questions not just at the overall workload level but also at the individual statement level. For example, a question such as ‘has the performance of a specific statement in a stored procedure degraded over time ?’

There are caveats to using sys.dm_exec_query_stats. Because the statistics are associated with cached plans certains executions will not have their statistics stored. For example, last_execution_time, execution_count etc won’t be updated if you execute a stored procedure WITH RECOMPILE. Certain commands will not have query statistics entries – BEGIN TRAN, ROLLBACK TRAN, DBCC, BACKUP etc. The only way to deal with these missing entries is to capture the entire workload using traces. The downside is the potentially tens of gigabytes worth of trace files that can be generated for busy systems. So, using query statistics for the analysis we trade off some of the commands we may miss for the saving in disk space and overhead of processing a potentially large number of trace files.

Gauging the impact of missing commands is fairly straightforward. You compare a trace analysis with a query statistics analysis. If total reads, total duration, total cpu from a trace analysis match the corresponding totals from a query statistics analysis then for the given workload you’ve validated the query statistics approach.

Having set the background we’ll move on to describe one approach to collecting and analysing data from sys.dm_exec_query_stats.

In SQL 2008 and above we have the query_hash column. Grouping on the query_hash, statement start offset and statement end offset we can aggregate the values for execution count, total work time, last reads etc to measure the workload hitting your SQL Server. To minimise the impact of statistics collection the approach listed here does not fetch the actual query text. This can easily be obtained once the overall workload picture is presented.

Just as for the collection of wait, latch and spinlock statistics (see early post), the approach here is to collect the query statistics from target servers and store the data in a warehouse on another server. Compared to the earlier post the only dimension table that changes is the dimServer table. We add a column to store the date and time that query statistics were last fetched. We avoid resampling the whole data by making use of this column. Here is the modified dimServer table:

-- Server table
IF OBJECT_ID('dbo.dimServer','U') IS NOT NULL DROP TABLE dbo.dimServer;
GO

CREATE TABLE dbo.dimServer
(
ServerID int NOT NULL IDENTITY(1,1),
ServerName sysname NOT NULL,
QueryStatsLastFetch datetime NOT NULL CONSTRAINT df_dimServer_QueryStatsLastFetch DEFAULT ('19900101')
CONSTRAINT pk_dimServer PRIMARY KEY CLUSTERED (ServerID),
CONSTRAINT uq_dimServer UNIQUE NONCLUSTERED (ServerName)
);

-- Pre-populate with your list of servers
INSERT INTO dbo.dimServer (ServerName)
SELECT CAST(SERVERPROPERTY('ServerName') AS sysname);

The procedure to get and set the QueryStatsLastFetch column is here:

IF OBJECT_ID('dbo.procServerQueryStatsLastFetch','P') IS NOT NULL DROP PROCEDURE dbo.procServerQueryStatsLastFetch;
GO

CREATE PROCEDURE [dbo].[procServerQueryStatsLastFetch]
@GetSet char(3),
@ServerName sysname,
@QueryStatsLastFetch datetime OUTPUT
AS
BEGIN

SET NOCOUNT ON

IF UPPER(@GetSet) = 'SET'
BEGIN
UPDATE dbo.dimServer SET QueryStatsLastFetch = @QueryStatsLastFetch WHERE ServerName = @ServerName
END

IF UPPER(@GetSet) = 'GET'
BEGIN
SELECT @QueryStatsLastFetch = QueryStatsLastFetch FROM dbo.dimServer WHERE ServerName = @ServerName
END

RETURN
END;

The query_hash, statement_start_offset and statement_end_offset are stored in another dimension table and are referenced through the QueryID column:

IF OBJECT_ID('dbo.dimQuery','U') IS NOT NULL DROP TABLE dbo.dimQuery;
GO

CREATE TABLE dbo.dimQuery
(
QueryID int IDENTITY(1,1) NOT NULL,
ServerID int NOT NULL,
query_hash binary(8) NOT NULL,
statement_start_offset int NOT NULL,
statement_end_offset int NOT NULL,
CONSTRAINT pk_dimQuery PRIMARY KEY CLUSTERED (QueryID),
CONSTRAINT uq_dimQuery UNIQUE NONCLUSTERED (ServerID,query_hash,statement_start_offset,statement_end_offset)
);

The query statistics are stored in this fact table:

IF OBJECT_ID('dbo.factQueryStats','U') IS NOT NULL DROP TABLE dbo.factQueryStats;
GO

CREATE TABLE dbo.factQueryStats(
ServerID int NOT NULL,
DateID int NOT NULL,
TimeID int NOT NULL,
QueryID int NOT NULL,
query_hash_count int NULL,
plan_generation_num bigint NULL,
creation_time datetime NULL,
last_execution_time datetime NULL,
execution_count bigint NULL,
total_worker_time bigint NULL,
last_worker_time bigint NULL,
min_worker_time bigint NULL,
max_worker_time bigint NULL,
total_physical_reads bigint NULL,
last_physical_reads bigint NULL,
min_physical_reads bigint NULL,
max_physical_reads bigint NULL,
total_logical_writes bigint NULL,
last_logical_writes bigint NULL,
min_logical_writes bigint NULL,
max_logical_writes bigint NULL,
total_logical_reads bigint NULL,
last_logical_reads bigint NULL,
min_logical_reads bigint NULL,
max_logical_reads bigint NULL,
total_clr_time bigint NULL,
last_clr_time bigint NULL,
min_clr_time bigint NULL,
max_clr_time bigint NULL,
total_elapsed_time bigint NULL,
last_elapsed_time bigint NULL,
min_elapsed_time bigint NULL,
max_elapsed_time bigint NULL,
total_rows bigint NULL,
last_rows bigint NULL,
min_rows bigint NULL,
max_rows bigint NULL,
CONSTRAINT pk_factQueryStats PRIMARY KEY CLUSTERED (ServerID,QueryID,DateID,TimeID)
);

To complete the setup we have the table type and stored procedure that inserts into the fact table:

-- Drop the stored procedure that use the table valued parameter
IF OBJECT_ID('dbo.procInsertQueryStats','P') IS NOT NULL DROP PROCEDURE dbo.procInsertQueryStats;
GO

-- Table type for table valued parameter
IF EXISTS (SELECT * FROM sys.types WHERE name = 'typeQueryStats') DROP TYPE dbo.typeQueryStats;
GO

CREATE TYPE dbo.typeQueryStats AS TABLE
(
ServerName sysname NOT NULL,
SampleDateTime datetime NOT NULL,
query_hash binary(8) NOT NULL,
statement_start_offset int NOT NULL,
statement_end_offset int NOT NULL,
query_hash_count int NULL,
plan_generation_num bigint NULL,
creation_time datetime NULL,
last_execution_time datetime NULL,
execution_count bigint NULL,
total_worker_time bigint NULL,
last_worker_time bigint NULL,
min_worker_time bigint NULL,
max_worker_time bigint NULL,
total_physical_reads bigint NULL,
last_physical_reads bigint NULL,
min_physical_reads bigint NULL,
max_physical_reads bigint NULL,
total_logical_writes bigint NULL,
last_logical_writes bigint NULL,
min_logical_writes bigint NULL,
max_logical_writes bigint NULL,
total_logical_reads bigint NULL,
last_logical_reads bigint NULL,
min_logical_reads bigint NULL,
max_logical_reads bigint NULL,
total_clr_time bigint NULL,
last_clr_time bigint NULL,
min_clr_time bigint NULL,
max_clr_time bigint NULL,
total_elapsed_time bigint NULL,
last_elapsed_time bigint NULL,
min_elapsed_time bigint NULL,
max_elapsed_time bigint NULL,
total_rows bigint NULL,
last_rows bigint NULL,
min_rows bigint NULL,
max_rows bigint NULL
);
GO

CREATE PROCEDURE dbo.procInsertQueryStats
@NearestSeconds int,
@tblStats typeQueryStats READONLY
AS
BEGIN

SET NOCOUNT ON;

DECLARE @ServerID int;
DECLARE @ServerName sysname;
DECLARE @SampleDateTime datetime;
DECLARE @QueryStatsLastFetch datetime;
DECLARE @DateID int;
DECLARE @TimeID int;

-- Summary dimension data from stats table
SELECT TOP 1
@ServerID = ds.ServerID,
@ServerName = ds.ServerName,
@SampleDateTime = s.SampleDateTime,
@QueryStatsLastFetch = s.last_execution_time
FROM @tblStats s
INNER JOIN dbo.dimServer ds
ON ds.ServerName = s.ServerName
ORDER BY last_execution_time DESC;

-- Normalise the sample time - round off to nearest seconds
SET @SampleDateTime = DATEADD(second,@NearestSeconds*(DATEPART(second,@SampleDateTime)/@NearestSeconds),CONVERT(varchar(40),@SampleDateTime,100));

-- Date and time IDs
SELECT @DateID = DateID
FROM dbo.dimDate
WHERE DateValue = CAST(@SampleDateTime AS date);

SELECT @TimeID = TimeID
FROM dbo.dimTime
WHERE TimeValue = CAST(@SampleDateTime AS time);

-- Insert queries that are new since last collection
INSERT INTO dimQuery(ServerID,query_hash,statement_start_offset,statement_end_offset)
SELECT @ServerID,query_hash,statement_start_offset,statement_end_offset
FROM @tblStats s
WHERE NOT EXISTS (
SELECT * FROM dimQuery dq
WHERE dq.ServerID = @ServerID
AND dq.query_hash = s.query_hash
AND dq.statement_start_offset = s.statement_start_offset
AND dq.statement_end_offset = s.statement_end_offset
);

INSERT INTO dbo.factQueryStats
SELECT
@ServerID,
@DateID,
@TimeID,
dq.QueryID,
s.query_hash_count,
s.plan_generation_num,
s.creation_time,
s.last_execution_time,
s.execution_count,
s.total_worker_time,
s.last_worker_time,
s.min_worker_time,
s.max_worker_time,
s.total_physical_reads,
s.last_physical_reads,
s.min_physical_reads,
s.max_physical_reads,
s.total_logical_writes,
s.last_logical_writes,
s.min_logical_writes,
s.max_logical_writes,
s.total_logical_reads,
s.last_logical_reads,
s.min_logical_reads,
s.max_logical_reads,
s.total_clr_time,
s.last_clr_time,
s.min_clr_time,
s.max_clr_time,
s.total_elapsed_time,
s.last_elapsed_time,
s.min_elapsed_time,
s.max_elapsed_time,
s.total_rows,
s.last_rows,
s.min_rows,
s.max_rows
FROM @tblStats s
INNER JOIN dbo.dimQuery dq
ON dq.ServerID = @ServerID
AND dq.query_hash = s.query_hash
AND dq.statement_start_offset = s.statement_start_offset
AND dq.statement_end_offset = s.statement_end_offset

-- Update the last collection time
EXEC dbo.procServerQueryStatsLastFetch 'SET',@ServerName,@QueryStatsLastFetch OUTPUT

END;
GO

Similarly to the case for wait, latch and spinlock statistics, there are many ways by which we could populate the fact table. A Powershell job step using the following script is one possibility:

# Query stats

# Get last collection datetime from data warehouse server
$targetserver = "DWHSRV"
$dbname = "DBAdwh"
$procname = "dbo.procServerQueryStatsLastFetch"
$getset = "GET"
$servername = "TARGETSRV"
[System.DateTime]$querystatslastfetch = "01 January 1990"

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=$dbname;Integrated Security=SSPI;")
$conn.Open()
$cmd = New-Object System.Data.SqlClient.SqlCommand($procname,$conn)
$cmd.CommandType = [System.Data.CommandType]'StoredProcedure'
$cmd.Parameters.Add("@GetSet",$getset) | Out-Null
$cmd.Parameters.Add("@ServerName",$servername) | Out-Null
$cmd.Parameters.Add("@QueryStatsLastFetch",$querystatslastfetch) | Out-Null
$cmd.Parameters["@QueryStatsLastFetch"].Direction = [System.Data.ParameterDirection]::Output
$cmd.ExecuteNonQuery() | Out-Null
[System.DateTime]$querystatslastfetch = $cmd.Parameters["@QueryStatsLastFetch"].Value
$conn.Close()
$conn.Dispose()

# Fetch stats from target server
$targetserver = "TARGETSRV"
$sql = @"
EXEC sp_executesql
N'SELECT CAST(SERVERPROPERTY(''ServerName'') AS sysname) AS ServerName,GETDATE() AS SampleDateTime,

query_hash,
statement_start_offset,
statement_end_offset,
COUNT(*) AS query_hash_count,

MAX(plan_generation_num) as plan_generation_num,
MAX(creation_time) as creation_time,
MAX(last_execution_time) as last_execution_time,
SUM(execution_count) as execution_count,

SUM(total_worker_time) as total_worker_time,
MAX(last_worker_time) as last_worker_time,
MIN(min_worker_time) as min_worker_time,
MAX(max_worker_time) as max_worker_time,

SUM(total_physical_reads) as total_physical_reads,
MAX(last_physical_reads) as last_physical_reads,
MIN(min_physical_reads) as min_physical_reads,
MAX(max_physical_reads) as max_physical_reads,

SUM(total_logical_writes) as total_logical_writes,
MAX(last_logical_writes) as last_logical_writes,
MIN(min_logical_writes) as min_logical_writes,
MAX(max_logical_writes) as max_logical_writes,

SUM(total_logical_reads) as total_logical_reads,
MAX(last_logical_reads) as last_logical_reads,
MIN(min_logical_reads) as min_logical_reads,
MAX(max_logical_reads) as max_logical_reads,

SUM(total_clr_time) as total_clr_time,
MAX(last_clr_time) as last_clr_time,
MIN(min_clr_time) as min_clr_time,
MAX(max_clr_time) as max_clr_time,

SUM(total_elapsed_time) as total_elapsed_time,
MAX(last_elapsed_time) as last_elapsed_time,
MIN(min_elapsed_time) as min_elapsed_time,
MAX(max_elapsed_time) as max_elapsed_time,

0,
0,
0,
0

FROM sys.dm_exec_query_stats
WHERE last_execution_time > @querystatslastfetch

GROUP BY
query_hash,
statement_start_offset,
statement_end_offset',
N'@querystatslastfetch datetime',
'$querystatslastfetch'
"@

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=master;Integrated Security=SSPI;")
$adapter = New-Object System.Data.SqlClient.SqlDataAdapter($sql,$conn)
$dataset = New-Object System.Data.DataSet
$adapter.Fill($dataset) | Out-Null
$conn.Dispose()

# Save results to data warehouse
$targetserver = "DWHSRV"
$dbname = "DBAdwh"
$procname = "dbo.procInsertQueryStats"
$nearestseconds = 60

$conn = New-Object System.Data.SqlClient.SqlConnection("Data Source=$targetserver;Initial Catalog=$dbname;Integrated Security=SSPI;")
$conn.Open()
$cmd = New-Object System.Data.SqlClient.SqlCommand($procname,$conn)
$cmd.CommandType = [System.Data.CommandType]'StoredProcedure'
$cmd.Parameters.Add("@NearestSeconds",$nearestseconds) | Out-Null
$cmd.Parameters.Add("@tblStats",$dataset.tables[0]) | Out-Null
$cmd.ExecuteNonQuery() | Out-Null
$conn.Close()
$conn.Dispose()

Notice in the sys.dm_exec_query_stats prepared statement I’m setting the total_rows, last_rows, min_rows,max_rows columns to zero. These columns are only available in SQL 2008 R2 and higher. The code can be changed to dynamically include or exclude these columns based on the target server being examined.

For a busy system a potentially large number of rows will be returned for each collection. It is important to put in place a purge routine (scheduled job) for the removal of historic data. The retention period can be tailored for each target server by mapping this attribute to a new column in the dimServer table. The following script takes a simpler, ‘one retention period for all servers’ approach:


SET NOCOUNT ON

DECLARE @RetainDays int
SET @RetainDays = 7

DECLARE @MinDateID int

SET @MinDateID = CONVERT(varchar(8),DATEADD(day,-1*@RetainDays,GETDATE()),112)

SELECT 'Remove statistics less than or equal to ',@MinDateID

SELECT 'Deleting historic QueryStats...'

WHILE @@ROWCOUNT <> 0
BEGIN
DELETE TOP (1000)
FROM dbo.factQueryStats
WHERE DateID <= @MinDateID
END

SELECT 'Deleting Queries which have no QueryStats...'

WHILE @@ROWCOUNT <> 0
BEGIN
DELETE TOP (1000) q
FROM dbo.dimQuery q
WHERE NOT EXISTS (
SELECT *
FROM dbo.factQueryStats qs
WHERE qs.QueryID = q.QueryID
)
END

In terms of presentation of the collected data, Powerpivot once again is the best tool. All tables with the exception of the dimQuery are imported into Powerpivot as tables. The varbinary query_hash column in dimQuery generates a data type error if the import table option is chosen. To overcome this create the following view in the data warehouse and import the data through it:


CREATE VIEW dbo.vwdimQuery
AS
SELECT
QueryID,
ServerID,
master.sys.fn_varbintohexstr(query_hash) AS query_hash,
statement_start_offset,
statement_end_offset
FROM dbo.dimQuery

After creating the appropriate relationships between the fact and dimension ID columns the final data model should appears as this:

Below I chart the historic execution count. We can see it increasing for QueryID 14. Note, I stopped my SQL Server at about 13:25 and restarted at about 19:30. The DMV statistics are reset when SQL Server is restarted:

As mentioned early, to minimise the impact on the target server the collection process does not fetch the actual query text. Once you’ve identified a query of interest you use its QueryID to fetch the query_hash from dimQuery. You then examine the target server’s DMV to get the full query details (the text, the plan, the plan attributes etc).

If you’re only interested in statistics at the procedure level and not at the individual statement level then a similar collection and analysis based on sys.dm_exec_procedure_stats can be performed.

This simple approach to collecting and analysing query statistics will reveal interesting details about your SQL Server workloads. Regular patterns will emerge. Together with the wait, latch and spinlock analysis you should now have a complete picture of your SQL Servers.

Leave a comment

Identifying rolled back transactions

In SQL 2008 R2 and before there are very few ways to identify transactions that have been rolled back. A server side trace capturing the SQLTransaction events is one way. You’d filter on EventSubClass = 2 to capture just the rollbacks. Or you could try reading the transaction log using fn_dblog() filtering on Operation = LOP_ABORT_XACT. Yet another way, which provides very limited data, is to periodically query the syscacheobjects sql column for any text which is like ‘%rollback%’. Each of these methods has their own issues and overheads. Ideally we’d like a light-weight solution.

Fortunately, in SQL Server 2012 extended events comes to the rescue. You can either use extended events to map to a trace definition for SQLTransaction events or you could roll your own event session. Here’s one possible roll your own solution:


-- Create the event session to catch rollbacks

CREATE EVENT SESSION evtsRollback
ON SERVER
ADD EVENT sqlserver.sql_transaction (
ACTION (sqlserver.session_id,sqlserver.database_id,sqlserver.sql_text)    -- Could add more actions
WHERE transaction_state = 2                        -- Rollbacks only
AND sqlserver.database_id = 8                        -- Just this one database
)
ADD TARGET package0.ring_buffer;
GO

-- Start the sesssion
ALTER EVENT SESSION evtsRollback ON SERVER STATE=START;

-- Begin a transaction, make a change, then rollback the transaction

-- Drop the event
ALTER EVENT SESSION evtsRollback ON SERVER DROP EVENT sqlserver.sql_transaction;

-- Query the results
SELECT
event.value('(event/@name)[1]','varchar(50)') AS event,
DATEADD(hh,DATEDIFF(hh,GETUTCDATE(),CURRENT_TIMESTAMP),event.value('(event/@timestamp)[1]','datetime2')) AS [timestamp],
event.value('(event/action[@name="session_id"])[1]','int') AS session_id,
event.value('(event/action[@name="database_id"])[1]','int') AS database_id,
event.value('(event/data[@name="duration"])[1]','bigint') AS duration_microseconds,
event.value('(event/data[@name="transaction_id"])[1]','bigint') AS transaction_id,
event.value('(event/data[@name="transaction_state"]/text)[1]','nvarchar(max)') AS transaction_state,
event.value('(event/data[@name="transaction_type"]/text)[1]','nvarchar(max)') AS transaction_type,
event.value('(event/action[@name="sql_text"])[1]','nvarchar(max)') AS sql_text
FROM
(
SELECT n.query('.') AS event
FROM
(
SELECT CAST(target_data as XML) AS target_data
FROM sys.dm_xe_sessions AS s
JOIN sys.dm_xe_session_targets AS t
ON s.address = t.event_session_address
WHERE s.name = 'evtsRollback'
AND t.target_name = 'ring_buffer'
) AS s
CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(n)
) AS t;

-- Tidy up
DROP EVENT SESSION evtsRollback ON SERVER;

The results of the select statement for my simple test case were as follows:

One scenario in which you’d want to run such a diagnostic is when you have thousands of transactions a minute. For the vast majority of the week they all succeed but you encounter few occasions when the front end application deliberately rolls back a transaction, for example, when thresholds for acceptable levels of performance have been exceeded. Without such a light-weight diagnostic tool identifying infrequent rollbacks becomes problematic.

 

 

Leave a comment

Analysing wait type, latch and spinlock statistics

The previous blog post outlined an approach for collecting wait type, latch and spinlock statistics. This post shows how the collected statistics can be analysed and graphed.

SQL Server retains cumulative figures for waiting task counts and wait times. These figures are reset to zero when SQL Server is restarted or when the DBCC SQLPERF(‘sys.dm_os_xxx_stats’,CLEAR) command is run. These raw cumulative figures can be easily graphed. You’ll see steady increases in most of these as well as sharp increases when resource issues arise. As well as cumulative figures we’d also like to see absolute values for waiting tasks and waiting times. These can be obtained by calculating the delta of the cumulative figures for consecutive sampling times. Using the factWaitStats table from the previous post as an example we could using the SQL Server 2012 LAG function to obtain the deltas:

SELECT *,
waiting_tasks_count - LAG(waiting_tasks_count,1,0) OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,TimeID) AS delta_waiting_tasks_count,
wait_time_ms - LAG(wait_time_ms,1,0) OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,TimeID) AS delta_wait_time_ms,
max_wait_time_ms - LAG(max_wait_time_ms,1,0) OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,TimeID) AS delta_max_wait_time_ms,
signal_wait_time_ms - LAG(signal_wait_time_ms,1,0) OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,TimeID) AS delta_signal_wait_time_ms,
TimeID - LAG(TimeID,1,0) OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,TimeID) AS delta_sample_time_sec
FROM dbo.factWaitStats

In versions prior to SQL 2012 you could use CTE’s, views, subqueries, CROSS APPLYs to obtain the deltas.

Having obtained the deltas you can use the relationships between the dimension tables and the fact tables to graph the figures that you are interested in.

Powerpivot is a wonderful analysis tool and can be used to great effect here. You’d import the dimServer, dimDate, dimTime and dimWaitType dimension tables. You’d set the date format of the dimTime TimeValue column to HH:MM (so that you don’t see today’s date appearing along with the time). You’d import the factWaitStats data using the SQL 2012 LAG statement above as the source SQL query. If you’re not using SQL 2012 you’d use the following SQL query to import factWaitStats data…

-- If you swap rownum and partitionnum in the SELECT column list you incur a SORT operation
-- in the query plan
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ServerID,WaitTypeID ORDER BY DateID,timeID) AS rownum,
DENSE_RANK() OVER (ORDER BY ServerID,WaitTypeID) AS partitionnum
FROM dbo.factWaitStats;

…and then add the following calculated columns to derive the deltas:

delta_waiting_tasks_count:
=IF([rownum]=1,0,[waiting_tasks_count]-MAXX(FILTER(factWaitStats,[partitionnum] = EARLIER([partitionnum]) && [rownum]=EARLIER([rownum]) - 1),[waiting_tasks_count]))

delta_wait_time_ms:
=IF([rownum]=1,0,[wait_time_ms]-MAXX(FILTER(factWaitStats,[partitionnum] = EARLIER([partitionnum]) && [rownum]=EARLIER([rownum]) - 1),[wait_time_ms]))

delta_max_wait_time_ms:
=IF([rownum]=1,0,[max_wait_time_ms]-MAXX(FILTER(factWaitStats,[partitionnum] = EARLIER([partitionnum]) && [rownum]=EARLIER([rownum]) - 1),[max_wait_time_ms]))

delta_signal_wait_time_ms:
=IF([rownum]=1,0,[signal_wait_time_ms]-MAXX(FILTER(factWaitStats,[partitionnum] = EARLIER([partitionnum]) && [rownum]=EARLIER([rownum]) - 1),[signal_wait_time_ms]))

delta_sample_time_sec:
=IF([rownum]=1,0,[TimeID]-MAXX(FILTER(factWaitStats,[partitionnum] = EARLIER([partitionnum]) && [rownum]=EARLIER([rownum]) - 1),[TimeID]))

I won’t go into the details of these DAX statements but you can see they are all of the same form. The only difference in them is one column name. The single partitionnum column saves us from having to type [ServerID]=EARLIER([ServerID]) && [WaitTypeID]=EARLIER([WaitTypeID]) in each definition. In other words, we import partitionnum using our source SQL statement and trade off some very minor storage cost for code simplification.

You then define the relationships between factWaitStats and dimServer, dimDate, dimTime and dimWaitType across ServerID, DateID, TimeID and WaitTypeID respectively. These are prerequisites for charting the data. The final Powerpivot worksheet should look something like this:

You can then use a pivot table and chart to graph the data. Here I’ve selected a day and am charting my WRITELOG waits:

In my case the statistics are read every 60 seconds so the deltas are relative to this sampling interval. You could use calculated measures in the pivot tables and charts to account for the sampling interval.

All sorts of simple dashboards could be created using the approach listed in this post.

Collecting wait type, latch and spinlock statistics and analysing them in this way gives you a true picture of the workloads your SQL Servers have to deal with and the resource limitations they face.

Leave a comment