Split two delimited strings in same order without function

Question 1

I am trying to split two columns with delimited strings into rows. The positions of the values in each string are related so I am trying to split it out so that the related values are in a row. I am unable to use function as I cannot create objects in the database

Here is sample table and data

CREATE TABLE #temp
(id INT,
 keys VARCHAR(50),
 vals VARCHAR(50)
);
INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');

and my desired output would be

ID key val
1 1 one
1 2 two
1 3 three
2 4 four
2 5 five
2 6 six
3 7 seven
3 8 eight
3 9 nine

I got the query to work if I only split one column, so I define two CTEs with row_number and join on ID and row_number. This does give desired output but my live table is very large and I was hoping for a way to pass through the table only once, instead of twice.

with keys as(
SELECT id,keys,vals,
 keys.keyid.value('.', 'VARCHAR(8000)') AS keyid,
 row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
 CAST('<Keys><key>'+REPLACE(keys, ',', '</key><key>')+'</key></Keys>' AS XML) AS tempkeys
 FROM #temp
) AS temp
CROSS APPLY tempkeys.nodes('/Keys/key') AS keys(keyid)),
vals as(
SELECT id,keys,vals,
 vals.val.value('.', 'VARCHAR(8000)') AS valid,
 row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
 CAST('<vals><val>'+REPLACE(vals, ',', '</val><val>')+'</val></vals>' AS XML) AS tempvals
 FROM #temp
) AS temp
CROSS APPLY tempvals.nodes('/vals/val') AS vals(val))
SELECT k.id, k.keyid, v.valid
FROM keys AS k
 INNER JOIN vals AS v
 ON k.id = v.id
 AND k.rn = v.rn;

Question 2

Why are you not allowed to create objects in the database? Can't you ask someone to do this for you? There are tons of good functions out there that have been proven to be about as efficient as you're going to be able to do this kind of thing (without fixing the design). So what is the actual roadblock, and can you work on that?

Question 3

@AaronBertrand I have the ability to create functions but the database is a dynamicsCRM and schema changes in the database are not supported. I am not sure whether a function would also violate the supportability of the database so we just dont make changes to it. My other thought would have been to create a database with just a split function, or put it in master/msdb and just run query with split function that way. Wouldn't I have same issue having to apply function to the table twice for each column?

Question 4

I don't see how a basic string splitting function could violate supportability in any way, or why it couldn't be in any other database. If you are open to a function (regardless of where it lives), I am fairly confident that we can do it in a tidier and hopefully more efficient way than your current approach. My only concern is what to do when the number of keys and the number of values doesn't match,

Question 5

Put the function into another database, call it via a 3 part object name. Your constraint is quite arbitrary: massively complex code with XML instead of a tidy CLR or numbers table function

Question 6

Create the function in msdb or somewhere else.

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder
(
 @List1 varchar(50),
 @List2 varchar(50),
 @Delim varchar(10)
)
RETURNS TABLE
AS
 RETURN
 (
 WITH src(r) AS 
 (
 SELECT 1 UNION ALL SELECT r + 1 FROM src WHERE r < 10
 ),
 Numbers(Number) AS 
 (
 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
 FROM src AS s1, src AS s2 -- add more if you need longer strings
 ),
 parsed(s1,s2,r1,r2)
 AS
 (
 SELECT
 SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
 + @Delim, n1.Number) - n1.Number),
 SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
 + @Delim, n2.Number) - n2.Number),
 r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
 r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
 FROM Numbers AS n1, Numbers AS n2
 ON n1.Number <= LEN(@List1)
 AND n2.Number <= LEN(@List2)
 AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
 AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
 )
 SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
 );

Then, as @gbn noted, reference it by 3-part name wherever your query has to run.

CREATE TABLE #temp
(id INT,
 keys VARCHAR(50),
 vals VARCHAR(50)
);
INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');
SELECT t.id, f.s1, f.s2 FROM #temp AS t
 CROSS APPLY msdb.dbo.SplitTwoStringsWithSameOrder(keys, vals, ',') AS f
 ORDER BY t.id, f.r1;
GO
DROP TABLE #temp;

Results:

enter image description here

The resulting plan, shown in Plan Explorer (disclaimer: I'm the Product Manager), is not the prettiest thing I've ever seen (click to enlarge a little bit):

enter image description here

But there is exactly one scan of #temp (4% cost). The biggest costs are two sorts and a spool, and there is some I/O due to a worktable which I am not sure is avoidable.

If you KNOW you will only ever have 50 characters in either of these strings, then you can get a much simpler plan with a built-in Numbers table (people object to these, but they're very useful, and they are almost always in memory if you reference them enough). This doesn't help I/O but removing the recursive CTE and other constructs of building the numbers inside the function is quite helpful for CPU etc.

First, the numbers table:

DROP TABLE dbo.Numbers;
;WITH n AS
(
 SELECT
 TOP (50) rn = ROW_NUMBER() OVER
 (ORDER BY [object_id])
 FROM sys.all_columns 
 ORDER BY [object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n;
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

Then a second version of the function:

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder2
(
 @List1 varchar(50),
 @List2 varchar(50),
 @Delim nvarchar(10)
)
RETURNS TABLE
AS
 RETURN
 (
 WITH parsed(s1,s2,r1,r2)
 AS
 (
 SELECT
 SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
 + @Delim, n1.Number) - n1.Number),
 SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
 + @Delim, n2.Number) - n2.Number),
 r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
 r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
 FROM dbo.Numbers AS n1
 INNER JOIN dbo.Numbers AS n2
 ON n1.Number <= LEN(@List1)
 AND n2.Number <= LEN(@List2)
 AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
 AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
 )
 SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
 );
GO

Here is the simpler plan that results (again, click to enlarge):

enter image description here

The plan still has two sort operations, but the spool is gone, there is still only one scan of #temp, and in my limited tests the cost numbers (absolute cost numbers, not %) were better every time.

I don't know precisely either of these will scale with a lot more rows, but it's worth testing, and if you weigh this against other solutions and it can't scale well, that may be a point your reconsider the design (store these relationally instead of as comma-separated sets).

Question 7

I have stepped into the same problem with a large numbers of rows and four list columns.

The previous solutions doesn`t workout for me.

The solution from @AaronBertrand has a problem with a different number of elements in the list. The problem can be solved by adding a partition by on the ROW_NUMBER:

r1 = ROW_NUMBER() OVER (PARTITION BY n2.Number ORDER BY n1.Number)

However still doesn't workout for me because of my large numbers of rows and elements.

I created the following script to solve my problem without using a function:

DROP TABLE IF EXISTS #temp
CREATE TABLE #temp
(
 id INT,
 keys VARCHAR(4000),
 vals VARCHAR(4000)
);
INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');
DECLARE @delim VARCHAR(4000) = ',';
WITH split AS (
 SELECT
 id
 ,CONVERT(VARCHAR(4000), CONCAT(keys, @delim)) AS keys
 ,CONVERT(VARCHAR(4000), CONCAT(vals, @delim)) AS vals
 ,1 AS iniciokeys
 ,COALESCE(NULLIF(CHARINDEX(@delim, keys, 1), 0), LEN(keys)) AS fimkeys
 ,CONVERT(VARCHAR(4000), RTRIM(LTRIM(SUBSTRING(keys, 1, COALESCE(NULLIF(CHARINDEX(@delim, keys, 1), 0), LEN(keys)) - 1)))) AS vkeys
 ,1 AS iniciovals
 ,COALESCE(NULLIF(CHARINDEX(@delim, vals, 1), 0), LEN(vals)) AS fimvals
 ,CONVERT(VARCHAR(4000), RTRIM(LTRIM(SUBSTRING(vals, 1, COALESCE(NULLIF(CHARINDEX(@delim, vals, 1), 0), LEN(vals)) - 1)))) AS vvals
 FROM #temp
 WHERE LEN(keys) > 0
 AND LEN(vals) > 0
 UNION ALL
 SELECT
 id
 ,CONVERT(VARCHAR(4000), keys) AS keys
 ,CONVERT(VARCHAR(4000), vals) AS vals
 ,CONVERT(INT, fimkeys) + 1 AS iniciokeys
 ,COALESCE(NULLIF(CHARINDEX(@delim, keys, fimkeys + 1), 0), LEN(keys)) AS fimkeys
 ,CONVERT(VARCHAR(4000), RTRIM(LTRIM(SUBSTRING(keys, fimkeys + 1, COALESCE(NULLIF(CHARINDEX(@delim, keys, fimkeys + 1), 0), LEN(keys))-fimkeys-1)))) AS vkeys
 ,CONVERT(INT, fimvals) + 1 AS iniciovals
 ,COALESCE(NULLIF(CHARINDEX(@delim, vals, fimvals + 1), 0), LEN(vals)) AS fimvals
 ,CONVERT(VARCHAR(4000), RTRIM(LTRIM(SUBSTRING(vals, fimvals + 1, COALESCE(NULLIF(CHARINDEX(@delim, vals, fimvals + 1), 0), LEN(vals))-fimvals-1)))) AS vvals
 FROM split
 WHERE fimkeys < LEN(keys)
 AND fimvals < LEN(vals)
)
SELECT
 id
 ,vkeys
 ,vvals
FROM split
ORDER BY id
 ,vkeys
OPTION(MAXRECURSION 32767)

Results:

Results

Query plan:

Query plan

As you can see, the query plan is very simple and have only one table scan on #temp.

The solution is also quite scalable.

score 2 · Accepted Answer · 2018-01-09 20:51:00Z

Create the function in msdb or somewhere else.

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder
(
 @List1 varchar(50),
 @List2 varchar(50),
 @Delim varchar(10)
)
RETURNS TABLE
AS
 RETURN
 (
 WITH src(r) AS 
 (
 SELECT 1 UNION ALL SELECT r + 1 FROM src WHERE r < 10
 ),
 Numbers(Number) AS 
 (
 SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
 FROM src AS s1, src AS s2 -- add more if you need longer strings
 ),
 parsed(s1,s2,r1,r2)
 AS
 (
 SELECT
 SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
 + @Delim, n1.Number) - n1.Number),
 SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
 + @Delim, n2.Number) - n2.Number),
 r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
 r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
 FROM Numbers AS n1, Numbers AS n2
 ON n1.Number <= LEN(@List1)
 AND n2.Number <= LEN(@List2)
 AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
 AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
 )
 SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
 );

Then, as @gbn noted, reference it by 3-part name wherever your query has to run.

CREATE TABLE #temp
(id INT,
 keys VARCHAR(50),
 vals VARCHAR(50)
);
INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');
SELECT t.id, f.s1, f.s2 FROM #temp AS t
 CROSS APPLY msdb.dbo.SplitTwoStringsWithSameOrder(keys, vals, ',') AS f
 ORDER BY t.id, f.r1;
GO
DROP TABLE #temp;

Results:

enter image description here

The resulting plan, shown in Plan Explorer (disclaimer: I'm the Product Manager), is not the prettiest thing I've ever seen (click to enlarge a little bit):

enter image description here

But there is exactly one scan of #temp (4% cost). The biggest costs are two sorts and a spool, and there is some I/O due to a worktable which I am not sure is avoidable.

If you KNOW you will only ever have 50 characters in either of these strings, then you can get a much simpler plan with a built-in Numbers table (people object to these, but they're very useful, and they are almost always in memory if you reference them enough). This doesn't help I/O but removing the recursive CTE and other constructs of building the numbers inside the function is quite helpful for CPU etc.

First, the numbers table:

DROP TABLE dbo.Numbers;
;WITH n AS
(
 SELECT
 TOP (50) rn = ROW_NUMBER() OVER
 (ORDER BY [object_id])
 FROM sys.all_columns 
 ORDER BY [object_id]
)
SELECT [Number] = rn - 1
INTO dbo.Numbers
FROM n;
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);

Then a second version of the function:

CREATE FUNCTION dbo.SplitTwoStringsWithSameOrder2
(
 @List1 varchar(50),
 @List2 varchar(50),
 @Delim nvarchar(10)
)
RETURNS TABLE
AS
 RETURN
 (
 WITH parsed(s1,s2,r1,r2)
 AS
 (
 SELECT
 SUBSTRING(@List1, n1.Number, CHARINDEX(@Delim, @List1 
 + @Delim, n1.Number) - n1.Number),
 SUBSTRING(@List2, n2.Number, CHARINDEX(@Delim, @List2 
 + @Delim, n2.Number) - n2.Number),
 r1 = ROW_NUMBER() OVER (ORDER BY n1.Number),
 r2 = ROW_NUMBER() OVER (ORDER BY n2.Number)
 FROM dbo.Numbers AS n1
 INNER JOIN dbo.Numbers AS n2
 ON n1.Number <= LEN(@List1)
 AND n2.Number <= LEN(@List2)
 AND SUBSTRING(@Delim + @List1, n1.Number, LEN(@Delim)) = @Delim
 AND SUBSTRING(@Delim + @List2, n2.Number, LEN(@Delim)) = @Delim
 )
 SELECT s1, s2, r1, r2 FROM parsed WHERE r1 = r2
 );
GO

Here is the simpler plan that results (again, click to enlarge):

enter image description here

The plan still has two sort operations, but the spool is gone, there is still only one scan of #temp, and in my limited tests the cost numbers (absolute cost numbers, not %) were better every time.

I don't know precisely either of these will scale with a lot more rows, but it's worth testing, and if you weigh this against other solutions and it can't scale well, that may be a point your reconsider the design (store these relationally instead of as comma-separated sets).

Stack Exchange Network

Split two delimited strings in same order without function

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Split two delimited strings in same order without function

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions