Archive for category SQL code

LAG and LEAD for SQL Server 2008

The LAG and LEAD analytical functions in SQL Server 2012 are very useful. Unfortunately, these functions don’t exist prior to the 2012 version. Most pre-2012 solutions resort to using self joins to identify the lagging or leading row. Here’s a solution that references the source table once:


WITH balance_details
AS
(
SELECT * FROM (
VALUES
('Tom','20140101',100),
('Tom','20140102',120),
('Tom','20140103',150),
('Tom','20140104',140),
('Tom','20140105',160),
('Tom','20140106',180),
('Jerry','20140101',210),
('Jerry','20140102',240),
('Jerry','20140103',230),
('Jerry','20140104',270),
('Jerry','20140105',190),
('Jerry','20140106',200),
('David','20140101',170),
('David','20140102',230),
('David','20140103',240),
('David','20140104',210),
('David','20140105',160),
('David','20140106',200)
) AS t (customer, balancedate,balance)
),
balance_cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY balancedate) rn
,(ROW_NUMBER() OVER (PARTITION BY customer ORDER BY balancedate))/2 rndiv2
,(ROW_NUMBER() OVER (PARTITION BY customer ORDER BY balancedate) + 1)/2 rnplus1div2
/*
,COUNT(*) OVER (PARTITION BY customer) partitioncount
*/
,customer
,balancedate
,balance
FROM balance_details
)
SELECT
rn
,rndiv2
,rnplus1div2
,customer,balancedate,balance
,CASE WHEN rn%2=1 THEN MAX(CASE WHEN rn%2=0 THEN balance END) OVER (PARTITION BY customer,rndiv2) ELSE MAX(CASE WHEN rn%2=1 THEN balance END) OVER (PARTITION BY customer,rnplus1div2)
END AS balance_lag
,CASE WHEN rn%2=1 THEN MAX(CASE WHEN rn%2=0 THEN balance END) OVER (PARTITION BY customer,rnplus1div2) ELSE MAX(CASE WHEN rn%2=1 THEN balance END) OVER (PARTITION BY customer,rndiv2)
END AS balance_lead
/*
,MAX(CASE WHEN rn=1 THEN balance END) OVER (PARTITION BY customer) AS first_value
,MAX(CASE WHEN rn=partitioncount THEN balance END) OVER (PARTITION BY customer) AS last_value
,MAX(CASE WHEN rn=4 THEN balance END) OVER (PARTITION BY customer) AS fourth_value
*/
FROM balance_cte
ORDER BY customer,balancedate

The results are as follows:

laglead

(The commented out code shows how to obtain the equivalent of the SQL 2012 FIRST_VALUE and LAST_VALUE. I also show how to obtain the Nth value where I’ve specified N as 4)

Regarding the LAG code it relies on the fact that integer division by 2 of the ROW_NUMBER can be used in the partitioning function. For example, 2 and 3 div 2 give 1. When the the current row is 3 (ie rn mod 2 is 1) we examine the window frame rows where rndiv2 is 1 and choose the balance value where the frame row is even (ie rn mod 2 is 0). That is, the value for row 2. When the current row is 2 (ie rn mod 2 is 0) we examine the window frame rows where rnplus1div2 is 1 and choose the balance value where the frame row is odd (ie rn mod 2 is 1). That is, the value for row 1.

For LEAD the priniciple is the same, we just use the opposite ‘division by 2’ expression in the partition clause.

Advertisements

Leave a comment

Adding or removing the IDENTITY property

The obvious methods that you use to add or remove the IDENTITY property involve ‘size of data’ operations. For example, if you use the SQL Server Management Studio table designer to change this property you will find that a brand new table is created and data moved from the existing table to a new table. If your table contains a huge number of rows this can be a very expensive operation. You can avoid these ‘size of data’ operations by making use of the ALTER TABLE… SWITCH command. Here’s a script that illustrates its use:


CREATE TABLE dbo.mytable
(
id INT NOT NULL CONSTRAINT pk_mytable PRIMARY KEY,
smallstr VARCHAR(200) NOT NULL,
largestr VARCHAR(MAX) NOT NULL
)
GO

DECLARE @i INT
SET @i = 1

SET NOCOUNT ON

BEGIN TRAN
WHILE @i <= 20000
BEGIN
INSERT INTO dbo.mytable
SELECT @i, 'Small string - no identity', 'Large string - no identity'
SET @i = @i + 1
END
COMMIT TRAN
GO

-- Rename the existing table and its constraints
EXEC sp_rename 'pk_mytable','pk_mytable_old','OBJECT'
EXEC sp_rename 'dbo.mytable', 'mytable_old'
GO

-- Create the new table making the id column an identity column
CREATE TABLE dbo.mytable
(
id INT NOT NULL IDENTITY CONSTRAINT pk_mytable PRIMARY KEY,
smallstr VARCHAR(200) NOT NULL,
largestr VARCHAR(MAX) NOT NULL
)
GO

-- Now switch to the new table - metadata operation that adds the IDENTITY property
ALTER TABLE dbo.mytable_old SWITCH TO dbo.mytable
GO

-- Run CHECKIDENT to ensure newly added rows have the correct values for their id values
DBCC CHECKIDENT('dbo.mytable')
GO

-- Now let's add the new rows
INSERT INTO dbo.mytable(smallstr,largestr)
SELECT 'Small string - with identity', 'Large string - with identity'
GO 3

-- Check the results
SELECT TOP 10 * FROM dbo.mytable ORDER BY id DESC
GO

-- Switch back to a table with no identity
DROP TABLE dbo.mytable_old
GO

EXEC sp_rename 'pk_mytable','pk_mytable_old','OBJECT'
EXEC sp_rename 'dbo.mytable', 'mytable_old'
GO

CREATE TABLE dbo.mytable
(
id INT NOT NULL CONSTRAINT pk_mytable PRIMARY KEY,
smallstr VARCHAR(200) NOT NULL,
largestr VARCHAR(MAX) NOT NULL
)
GO

ALTER TABLE dbo.mytable_old SWITCH TO dbo.mytable
GO

INSERT INTO dbo.mytable(id,smallstr,largestr)
SELECT 20004,'Small string - no identity', 'Large string - no identity'
UNION
SELECT 20005,'Small string - no identity', 'Large string - no identity'
UNION
SELECT 20006,'Small string - no identity', 'Large string - no identity'

-- Check the results
SELECT TOP 10 * FROM dbo.mytable ORDER BY id DESC
GO

The final results are as follows:

addremoveidentity

 

Leave a comment

Round time to nearest x minutes or x seconds

If you collect time series data and need to normalise the times (for example, for grouping purposes) here’s some code that implements rounding to the nearest x minutes:


-- Round times to the nearest 5 minutes
DECLARE @nearestminutes int = 5

SELECT DATEADD( minute, 
                ( DATEDIFF(minute, CONVERT(char(8),create_date,112), DATEADD(second, @nearestminutes * 60 / 2, create_date)) / @nearestminutes ) * @nearestminutes,
                CONVERT(char(8),create_date,112) ) as create_date_rounded,
       DATEADD( minute, 
                ( DATEDIFF(minute, CONVERT(char(8),create_date,112), create_date) / @nearestminutes ) * @nearestminutes,
                CONVERT(char(8),create_date,112) ) as create_date_truncated,
       create_date,
       name
FROM sys.objects
ORDER BY name

You can see I show the difference between rounding and truncating:

roundtime

Change @nearestminutes to whatever you like.  For example, 60 to round to the nearest hour or 120 to round to the nearest two hours.

To round to the nearest x seconds you can use this


-- Round times to the nearest 5 seconds
DECLARE @nearestseconds int = 5

SELECT DATEADD( second,
                ( DATEDIFF(second, CONVERT(char(8),create_date,112), DATEADD(millisecond, @nearestseconds * 1000 / 2, create_date)) / @nearestseconds ) * @nearestseconds,
                CONVERT(char(8),create_date,112) ) as create_date_rounded,
       DATEADD( second,
                ( DATEDIFF(second, CONVERT(char(8),create_date,112), create_date) / @nearestseconds ) * @nearestseconds,
                CONVERT(char(8),create_date,112) ) as create_date_truncated,
       create_date,
       name
FROM sys.objects
ORDER BY name

1 Comment

Splitting character delimited strings

The scenario for this post is that you have a table containing rows holding character delimited strings. The requirement is to normalise the data by splitting the delimited strings into their constituent elements.

Here’s the data and the steps that take us to the final solution:

SET NOCOUNT ON

DECLARE @mytable TABLE (id int NOT NULL PRIMARY KEY, csvstring varchar(8000) NOT NULL)

DECLARE @separator char(1)
SET @separator = ','

-- Insert raw data. Last row can be added to test edge case where string reaches its maximum size.
INSERT INTO @mytable VALUES (1,'str1,str2,str3,str4,str5')
INSERT INTO @mytable VALUES (2,'str6,str7,str8')
INSERT INTO @mytable VALUES (3,'str9,str10,str11,str12')
--INSERT INTO @mytable SELECT 4,REPLICATE('strX,',1600)

-- Show the raw data
SELECT * FROM @mytable

-- Now show the same raw data including a new column for the csv string converted to an XML document
-- Have to convert csv string to varchar(max) to get the string functions to handle large strings
-- If you use the commented out SELECT and have the large strX string in the table you'll receive the
-- following error:
-- Msg 9400, Level 16, State 1, Line 21
-- XML parsing: line 1, character 8000, unexpected end of input

SELECT
t.id,
t.csvstring,
x.xmldoc
FROM @mytable t
CROSS APPLY (
--    SELECT CAST ('<T>' + REPLACE(t.csvstring,@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
) AS x

-- Finally use the XQuery nodes method to extract the constituent parts
SELECT
t.id,
t.csvstring,
x.xmldoc,
nodelist.frag.query('.') AS xmlfragment,
nodelist.frag.value('.','varchar(8000)') AS strval
FROM @mytable t
CROSS APPLY (
SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
) AS x
CROSS APPLY x.xmldoc.nodes('/T') AS nodelist (frag)

The results are as follows:

Split string

The first and last columns in the final result set are typically what would actually be required. For debugging purposes I show the steps that lead up to the final solution.

Leave a comment