Combine column from multiple rows into single row

Question 1

I've got some customer_comments split out into multiple rows due to database design, and for a report I need to combine the comments from each unique id into one row. I previously tried something working with this delimited list from SELECT clause and COALESCE trick but I can't recall it and must not have saved it. I can't seem to get it to work in this case either, only seems to work on a single row.

The data looks like this:

id row_num customer_code comments
-----------------------------------
1 1 Dilbert Hard
1 2 Dilbert Worker
2 1 Wally Lazy

My results need to look like this:

id customer_code comments
------------------------------
1 Dilbert Hard Worker
2 Wally Lazy

So for each row_num there's really only one row of results; the comments should be combined in the order of row_num. The above linked SELECT trick works to get all the values for a specific query as one row, but I can't figure out how to make it work as part of a SELECT statement that spits all these rows out.

My query has to go through the whole table on its own and output these rows. I'm not combining them into multiple columns, one for each row, so PIVOT doesn't seem applicable.

Question 2

This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:

DECLARE @x TABLE 
(
 id INT, 
 row_num INT, 
 customer_code VARCHAR(32), 
 comments VARCHAR(32)
);
INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';
SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments 
 FROM @x AS x2 WHERE id = x.id
 ORDER BY row_num
 FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;

If you have a case where the data in comments could contain unsafe-for-XML characters (>, <, &), you should change this:

 FOR XML PATH('')), 1, 1, '')

To this more elaborate approach:

 FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')

(Be sure to use the right destination data type, varchar or nvarchar, and the right length, and prefix all string literals with N if using nvarchar.)

Question 3

+1 I creadted a fiddle for that for a quick look sqlfiddle.com/#!3/e4ee5/2

Question 4

Yep, this works like a charm. @MarlonRibunal SQL Fiddle's really shaping up!

Question 5

@NickChammas - I am going to stick my neck out and say that the order is guaranteed using the order by in the sub query. This is building XML using for xml and that is the way to build XML using TSQL. Order of elements in an XML files is an important matter and can be relied upon. So if this technique does not guarantee order then XML support in TSQL is severely broken.

Question 6

I've validated that the query will return results in the correct order regardless of the clustered index on the underlying table (even a clustered index on row_num desc must obey the order by as Mikael suggested). I'm going to remove comments suggesting otherwise now that the query contains the right order by and hope that @JonSeigel considers doing the same.

Question 7

If you're allowed to use CLR in your environment, this is a tailor-made case for a user-defined aggregate.

In particular, this is probably the way to go if the source data is non-trivially large and/or you need to do this type of thing a lot in your application. I strongly suspect the query plan for Aaron's solution will not scale well as the input size grows. (I tried adding an index to the temp table, but that didn't help.)

This solution, like many other things, is a tradeoff:

Politics/policy for even using CLR Integration in your, or your client's, environment.
CLR function is likely faster, and will scale better given a real set of data.
CLR function will be reusable in other queries, and you won't have to duplicate (and debug) a complex subquery every time you need to do this type of thing.
Straight T-SQL is simpler than writing and managing a piece of external code.
Perhaps you don't know how to program in C# or VB.
etc.

EDIT: Well, I went to try to see if this actually was better, and it turns out the requirement that the comments be in a specific order is currently not possible to satisfy using an aggregate function. :(

See SqlUserDefinedAggregateAttribute.IsInvariantToOrder. Basically, what you need to do is OVER(PARTITION BY customer_code ORDER BY row_num) but ORDER BY is not supported in the OVER clause when aggregating. I'm assuming adding this functionality to SQL Server opens a can of worms, because what would need to be changed in the execution plan is trivial. The aforementioned link says this is reserved for future use, so this could be implemented in the future (on 2005 you're probably out of luck, though).

This could still be accomplished by packing and parsing the row_num value into the aggregated string, and then doing the sort within the CLR object... which seems pretty hackish.

In any event, below is the code I used in case anyone else finds this useful even with the limitation. I'll leave the hacking part as an exercise for the reader. Note that I used AdventureWorks (2005) for test data.

Aggregate assembly:

using System;
using System.IO;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace MyCompany.SqlServer
{
 [Serializable]
 [SqlUserDefinedAggregate
 (
 Format.UserDefined,
 IsNullIfEmpty = false,
 IsInvariantToDuplicates = false,
 IsInvariantToNulls = true,
 IsInvariantToOrder = false,
 MaxByteSize = -1
 )]
 public class StringConcatAggregate : IBinarySerialize
 {
 private string _accum;
 private bool _isEmpty;
 public void Init()
 {
 _accum = string.Empty;
 _isEmpty = true;
 }
 public void Accumulate(SqlString value)
 {
 if (!value.IsNull)
 {
 if (!_isEmpty)
 _accum += ' ';
 else
 _isEmpty = false;
 _accum += value.Value;
 }
 }
 public void Merge(StringConcatAggregate value)
 {
 Accumulate(value.Terminate());
 }
 public SqlString Terminate()
 {
 return new SqlString(_accum);
 }
 public void Read(BinaryReader r)
 {
 this.Init();
 _accum = r.ReadString();
 _isEmpty = _accum.Length == 0;
 }
 public void Write(BinaryWriter w)
 {
 w.Write(_accum);
 }
 }
}

T-SQL for testing (CREATE ASSEMBLY, and sp_configure to enable CLR omitted):

CREATE TABLE [dbo].[Comments]
(
 CustomerCode int NOT NULL,
 RowNum int NOT NULL,
 Comments nvarchar(25) NOT NULL
)
INSERT INTO [dbo].[Comments](CustomerCode, RowNum, Comments)
 SELECT
 DENSE_RANK() OVER(ORDER BY FirstName),
 ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY ContactID),
 Phone
 FROM [AdventureWorks].[Person].[Contact]
GO
CREATE AGGREGATE [dbo].[StringConcatAggregate]
(
 @input nvarchar(MAX)
)
RETURNS nvarchar(MAX)
EXTERNAL NAME StringConcatAggregate.[MyCompany.SqlServer.StringConcatAggregate]
GO
SELECT
 CustomerCode,
 [dbo].[StringConcatAggregate](Comments) AS AllComments
 FROM [dbo].[Comments]
 GROUP BY CustomerCode

Question 8

Here's a cursor-based solution that guarantees the order of the comments by row_num. (See my other answer for how the [dbo].[Comments] table was populated.)

SET NOCOUNT ON
DECLARE cur CURSOR LOCAL FAST_FORWARD FOR
 SELECT
 CustomerCode,
 Comments
 FROM [dbo].[Comments]
 ORDER BY
 CustomerCode,
 RowNum
DECLARE @curCustomerCode int
DECLARE @lastCustomerCode int
DECLARE @curComment nvarchar(25)
DECLARE @comments nvarchar(MAX)
DECLARE @results table
(
 CustomerCode int NOT NULL,
 AllComments nvarchar(MAX) NOT NULL
)
OPEN cur
FETCH NEXT FROM cur INTO
 @curCustomerCode, @curComment
SET @lastCustomerCode = @curCustomerCode
WHILE @@FETCH_STATUS = 0
BEGIN
 IF (@lastCustomerCode != @curCustomerCode)
 BEGIN
 INSERT INTO @results(CustomerCode, AllComments)
 VALUES(@lastCustomerCode, @comments)
 SET @lastCustomerCode = @curCustomerCode
 SET @comments = NULL
 END
 IF (@comments IS NULL)
 SET @comments = @curComment
 ELSE
 SET @comments = @comments + N' ' + @curComment
 FETCH NEXT FROM cur INTO
 @curCustomerCode, @curComment
END
IF (@comments IS NOT NULL)
BEGIN
 INSERT INTO @results(CustomerCode, AllComments)
 VALUES(@curCustomerCode, @comments)
END
CLOSE cur
DEALLOCATE cur
SELECT * FROM @results

Question 9

-- solution avoiding the cursor ...
DECLARE @idMax INT
DECLARE @idCtr INT
DECLARE @comment VARCHAR(150)
SELECT @idMax = MAX(id)
FROM [dbo].[CustomerCodeWithSeparateComments]
IF @idMax = 0
 return
DECLARE @OriginalTable AS Table
(
 [id] [int] NOT NULL,
 [row_num] [int] NULL,
 [customer_code] [varchar](50) NULL,
 [comment] [varchar](120) NULL
)
DECLARE @FinalTable AS Table
(
 [id] [int] IDENTITY(1,1) NOT NULL,
 [customer_code] [varchar](50) NULL,
 [comment] [varchar](120) NULL
)
INSERT INTO @FinalTable 
([customer_code])
SELECT [customer_code]
FROM [dbo].[CustomerCodeWithSeparateComments]
GROUP BY [customer_code]
INSERT INTO @OriginalTable
 ([id]
 ,[row_num]
 ,[customer_code]
 ,[comment])
SELECT [id]
 ,[row_num]
 ,[customer_code]
 ,[comment]
FROM [dbo].[CustomerCodeWithSeparateComments]
ORDER BY id, row_num
SET @idCtr = 1
SET @comment = ''
WHILE @idCtr < @idMax
BEGIN
 SELECT @comment = @comment + ' ' + comment
 FROM @OriginalTable 
 WHERE id = @idCtr
 UPDATE @FinalTable
 SET [comment] = @comment
 WHERE [id] = @idCtr 
 SET @idCtr = @idCtr + 1
 SET @comment = ''
END 
SELECT @comment = @comment + ' ' + comment
 FROM @OriginalTable 
 WHERE id = @idCtr
UPDATE @FinalTable
 SET [comment] = @comment
WHERE [id] = @idCtr
SELECT *
FROM @FinalTable

Question 10

You haven't avoided a cursor. You've just called your cursor a while loop instead.

score 18 · Accepted Answer · 2012-05-15 22:02:47Z

This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:

DECLARE @x TABLE 
(
 id INT, 
 row_num INT, 
 customer_code VARCHAR(32), 
 comments VARCHAR(32)
);
INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';
SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments 
 FROM @x AS x2 WHERE id = x.id
 ORDER BY row_num
 FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;

If you have a case where the data in comments could contain unsafe-for-XML characters (>, <, &), you should change this:

 FOR XML PATH('')), 1, 1, '')

To this more elaborate approach:

 FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')

(Be sure to use the right destination data type, varchar or nvarchar, and the right length, and prefix all string literals with N if using nvarchar.)

+1 I creadted a fiddle for that for a quick look sqlfiddle.com/#!3/e4ee5/2
Yep, this works like a charm. @MarlonRibunal SQL Fiddle's really shaping up!
@NickChammas - I am going to stick my neck out and say that the order is guaranteed using the order by in the sub query. This is building XML using for xml and that is the way to build XML using TSQL. Order of elements in an XML files is an important matter and can be relied upon. So if this technique does not guarantee order then XML support in TSQL is severely broken.
I've validated that the query will return results in the correct order regardless of the clustered index on the underlying table (even a clustered index on row_num desc must obey the order by as Mikael suggested). I'm going to remove comments suggesting otherwise now that the query contains the right order by and hope that @JonSeigel considers doing the same.

Stack Exchange Network

Combine column from multiple rows into single row

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Combine column from multiple rows into single row

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions