Archive for July 31st, 2013

Splitting character delimited strings

The scenario for this post is that you have a table containing rows holding character delimited strings. The requirement is to normalise the data by splitting the delimited strings into their constituent elements.

Here’s the data and the steps that take us to the final solution:

SET NOCOUNT ON

DECLARE @mytable TABLE (id int NOT NULL PRIMARY KEY, csvstring varchar(8000) NOT NULL)

DECLARE @separator char(1)
SET @separator = ','

-- Insert raw data. Last row can be added to test edge case where string reaches its maximum size.
INSERT INTO @mytable VALUES (1,'str1,str2,str3,str4,str5')
INSERT INTO @mytable VALUES (2,'str6,str7,str8')
INSERT INTO @mytable VALUES (3,'str9,str10,str11,str12')
--INSERT INTO @mytable SELECT 4,REPLICATE('strX,',1600)

-- Show the raw data
SELECT * FROM @mytable

-- Now show the same raw data including a new column for the csv string converted to an XML document
-- Have to convert csv string to varchar(max) to get the string functions to handle large strings
-- If you use the commented out SELECT and have the large strX string in the table you'll receive the
-- following error:
-- Msg 9400, Level 16, State 1, Line 21
-- XML parsing: line 1, character 8000, unexpected end of input

SELECT
t.id,
t.csvstring,
x.xmldoc
FROM @mytable t
CROSS APPLY (
--    SELECT CAST ('<T>' + REPLACE(t.csvstring,@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
) AS x

-- Finally use the XQuery nodes method to extract the constituent parts
SELECT
t.id,
t.csvstring,
x.xmldoc,
nodelist.frag.query('.') AS xmlfragment,
nodelist.frag.value('.','varchar(8000)') AS strval
FROM @mytable t
CROSS APPLY (
SELECT CAST ('<T>' + REPLACE(CAST (t.csvstring AS varchar(max)),@separator,'</T><T>') + '</T>' AS xml) AS xmldoc
) AS x
CROSS APPLY x.xmldoc.nodes('/T') AS nodelist (frag)

The results are as follows:

Split string

The first and last columns in the final result set are typically what would actually be required. For debugging purposes I show the steps that lead up to the final solution.

Advertisements

Leave a comment