Archive for July, 2013
The scenario for this post is that you have a table containing rows holding character delimited strings. The requirement is to normalise the data by splitting the delimited strings into their constituent elements.
Here’s the data and the steps that take us to the final solution:
SET NOCOUNT ON DECLARE @mytable TABLE (id int NOT NULL PRIMARY KEY, csvstring varchar(8000) NOT NULL) DECLARE @separator char(1) SET @separator = ',' -- Insert raw data. Last row can be added to test edge case where string reaches its maximum size. INSERT INTO @mytable VALUES (1,'str1,str2,str3,str4,str5') INSERT INTO @mytable VALUES (2,'str6,str7,str8') INSERT INTO @mytable VALUES (3,'str9,str10,str11,str12') --INSERT INTO @mytable SELECT 4,REPLICATE('strX,',1600) -- Show the raw data SELECT * FROM @mytable -- Now show the same raw data including a new column for the csv string converted to an XML document -- Have to convert csv string to varchar(max) to get the string functions to handle large strings -- If you use the commented out SELECT and have the large strX string in the table you'll receive the -- following error: -- Msg 9400, Level 16, State 1, Line 21 -- XML parsing: line 1, character 8000, unexpected end of input SELECT t.id, t.csvstring, x.xmldoc FROM @mytable t CROSS APPLY ( -- SELECT CAST ('<T>' + REPLACE(t.csvstring,@separator,'</T><T>') + '</T>' AS xml) AS xmldoc SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc ) AS x -- Finally use the XQuery nodes method to extract the constituent parts SELECT t.id, t.csvstring, x.xmldoc, nodelist.frag.query('.') AS xmlfragment, nodelist.frag.value('.','varchar(8000)') AS strval FROM @mytable t CROSS APPLY ( SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc ) AS x CROSS APPLY x.xmldoc.nodes('/T') AS nodelist (frag)
The results are as follows:
The first and last columns in the final result set are typically what would actually be required. For debugging purposes I show the steps that lead up to the final solution.
From a monitoring and troubleshooting perspective we’re all probably familiar with the dynamic management objects referred to in these very simple select statements:
select @@servername as servername,getdate() as sampletime,* from sys.dm_os_wait_stats
select @@servername as servername,getdate() as sampletime,* from sys.dm_os_latch_stats
select @@servername as servername,getdate() as sampletime,* from sys.dm_os_spinlock_stats
select @@servername as servername,getdate() as sampletime,* from sys.dm_io_virtual_file_stats(null,null)
select @@servername as servername,getdate() as sampletime,* from sys.dm_db_index_operational_stats(null,null,null,null)
select @@servername as servername,getdate() as sampletime,* from sys.dm_db_index_usage_stats
These queries return cumulative statistics since SQL Server was last restarted, since statistics were last manually cleared or since index metadata type actions were last run. Plotting cumulative figures over sample time we’d see steadily increasing slopes. Sudden increases in slope would indicate a point of interest. For example, sudden increase in share level lock wait time indicating blocking. These types of points of interest would be more easily revealed if we were to plot the differences or deltas between consecutive sample times. In many blog posts about cumulative statistics (for example, https://sqlscope.wordpress.com/2012/04/09/analysing-wait-type-latch-and-spinlock-statistics/) we see the use of SQL constructs such as windowing functions (LAG), subqueries, CROSS APPLY to calculate these deltas. Although these approaches are valid relying on SQL constructs can be limiting. You’re using SQL Server to calculate the deltas for every column that you wish to graph.
If a difference or delta calculating function existed in your reporting tool then the same generic approach could be applied to all your cumulative statistics columns rather than you having to hand roll SQL to do the calculations. In effect you are offloading the business intelligence function to your reporting tool.
The reporting tool that I tend to use for ad hoc reporting and charting is Excel. In terms of this product’s functionality we are all probably familiar with the simple line charts and possibly the pivot tables and charts that show sum, min, max, average aggregations. One option that exists for the aggregations is to show the summarised values based on a calculation. One of the calculations that’s available is the difference from previous value:
When the difference from the previous SampleTime base field is chosen we see this as the resulting chart:
Moving WaitTypeName to the Legend Series you can now make use of the TOP N filter to show the highest waits (here I’ve not filtered out all the system level waits, in a real world analysis you’d filter these out in the SELECT statement):