I've got some customer_comments
split out into multiple rows due to database design, and for a report I need to combine the comments
from each unique id
into one row. I previously tried something working with this delimited list from SELECT clause and COALESCE trick but I can't recall it and must not have saved it. I can't seem to get it to work in this case either, only seems to work on a single row.
The data looks like this:
id row_num customer_code comments
-----------------------------------
1 1 Dilbert Hard
1 2 Dilbert Worker
2 1 Wally Lazy
My results need to look like this:
id customer_code comments
------------------------------
1 Dilbert Hard Worker
2 Wally Lazy
So for each row_num
there's really only one row of results; the comments should be combined in the order of row_num
. The above linked SELECT
trick works to get all the values for a specific query as one row, but I can't figure out how to make it work as part of a SELECT
statement that spits all these rows out.
My query has to go through the whole table on its own and output these rows. I'm not combining them into multiple columns, one for each row, so PIVOT
doesn't seem applicable.
4 Answers 4
This is relatively trivial to do with a correlated subquery. You can't use the COALESCE method highlighted in the blog post you mention unless you extract that to a user-defined function (or unless you only want to return one row at a time). Here is how I typically do this:
DECLARE @x TABLE
(
id INT,
row_num INT,
customer_code VARCHAR(32),
comments VARCHAR(32)
);
INSERT @x SELECT 1,1,'Dilbert','Hard'
UNION ALL SELECT 1,2,'Dilbert','Worker'
UNION ALL SELECT 2,1,'Wally','Lazy';
SELECT id, customer_code, comments = STUFF((SELECT ' ' + comments
FROM @x AS x2 WHERE id = x.id
ORDER BY row_num
FOR XML PATH('')), 1, 1, '')
FROM @x AS x
GROUP BY id, customer_code
ORDER BY id;
If you have a case where the data in comments could contain unsafe-for-XML characters (>
, <
, &
), you should change this:
FOR XML PATH('')), 1, 1, '')
To this more elaborate approach:
FOR XML PATH(''), TYPE).value(N'(./text())[1]', N'varchar(max)'), 1, 1, '')
(Be sure to use the right destination data type, varchar
or nvarchar
, and the right length, and prefix all string literals with N
if using nvarchar
.)
-
3+1 I creadted a fiddle for that for a quick look sqlfiddle.com/#!3/e4ee5/2MarlonRibunal– MarlonRibunal2012年05月16日 04:44:06 +00:00Commented May 16, 2012 at 4:44
-
3Yep, this works like a charm. @MarlonRibunal SQL Fiddle's really shaping up!Zelda– Zelda2012年05月16日 13:03:55 +00:00Commented May 16, 2012 at 13:03
-
@NickChammas - I am going to stick my neck out and say that the order is guaranteed using the
order by
in the sub query. This is building XML usingfor xml
and that is the way to build XML using TSQL. Order of elements in an XML files is an important matter and can be relied upon. So if this technique does not guarantee order then XML support in TSQL is severely broken.Mikael Eriksson– Mikael Eriksson2012年06月02日 19:22:30 +00:00Commented Jun 2, 2012 at 19:22 -
2I've validated that the query will return results in the correct order regardless of the clustered index on the underlying table (even a clustered index on
row_num desc
must obey theorder by
as Mikael suggested). I'm going to remove comments suggesting otherwise now that the query contains the rightorder by
and hope that @JonSeigel considers doing the same.Aaron Bertrand– Aaron Bertrand2012年06月03日 00:53:37 +00:00Commented Jun 3, 2012 at 0:53
If you're allowed to use CLR in your environment, this is a tailor-made case for a user-defined aggregate.
In particular, this is probably the way to go if the source data is non-trivially large and/or you need to do this type of thing a lot in your application. I strongly suspect the query plan for Aaron's solution will not scale well as the input size grows. (I tried adding an index to the temp table, but that didn't help.)
This solution, like many other things, is a tradeoff:
- Politics/policy for even using CLR Integration in your, or your client's, environment.
- CLR function is likely faster, and will scale better given a real set of data.
- CLR function will be reusable in other queries, and you won't have to duplicate (and debug) a complex subquery every time you need to do this type of thing.
- Straight T-SQL is simpler than writing and managing a piece of external code.
- Perhaps you don't know how to program in C# or VB.
- etc.
EDIT: Well, I went to try to see if this actually was better, and it turns out the requirement that the comments be in a specific order is currently not possible to satisfy using an aggregate function. :(
See SqlUserDefinedAggregateAttribute.IsInvariantToOrder. Basically, what you need to do is OVER(PARTITION BY customer_code ORDER BY row_num)
but ORDER BY
is not supported in the OVER
clause when aggregating. I'm assuming adding this functionality to SQL Server opens a can of worms, because what would need to be changed in the execution plan is trivial. The aforementioned link says this is reserved for future use, so this could be implemented in the future (on 2005 you're probably out of luck, though).
This could still be accomplished by packing and parsing the row_num
value into the aggregated string, and then doing the sort within the CLR object... which seems pretty hackish.
In any event, below is the code I used in case anyone else finds this useful even with the limitation. I'll leave the hacking part as an exercise for the reader. Note that I used AdventureWorks (2005) for test data.
Aggregate assembly:
using System;
using System.IO;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
namespace MyCompany.SqlServer
{
[Serializable]
[SqlUserDefinedAggregate
(
Format.UserDefined,
IsNullIfEmpty = false,
IsInvariantToDuplicates = false,
IsInvariantToNulls = true,
IsInvariantToOrder = false,
MaxByteSize = -1
)]
public class StringConcatAggregate : IBinarySerialize
{
private string _accum;
private bool _isEmpty;
public void Init()
{
_accum = string.Empty;
_isEmpty = true;
}
public void Accumulate(SqlString value)
{
if (!value.IsNull)
{
if (!_isEmpty)
_accum += ' ';
else
_isEmpty = false;
_accum += value.Value;
}
}
public void Merge(StringConcatAggregate value)
{
Accumulate(value.Terminate());
}
public SqlString Terminate()
{
return new SqlString(_accum);
}
public void Read(BinaryReader r)
{
this.Init();
_accum = r.ReadString();
_isEmpty = _accum.Length == 0;
}
public void Write(BinaryWriter w)
{
w.Write(_accum);
}
}
}
T-SQL for testing (CREATE ASSEMBLY
, and sp_configure
to enable CLR omitted):
CREATE TABLE [dbo].[Comments]
(
CustomerCode int NOT NULL,
RowNum int NOT NULL,
Comments nvarchar(25) NOT NULL
)
INSERT INTO [dbo].[Comments](CustomerCode, RowNum, Comments)
SELECT
DENSE_RANK() OVER(ORDER BY FirstName),
ROW_NUMBER() OVER(PARTITION BY FirstName ORDER BY ContactID),
Phone
FROM [AdventureWorks].[Person].[Contact]
GO
CREATE AGGREGATE [dbo].[StringConcatAggregate]
(
@input nvarchar(MAX)
)
RETURNS nvarchar(MAX)
EXTERNAL NAME StringConcatAggregate.[MyCompany.SqlServer.StringConcatAggregate]
GO
SELECT
CustomerCode,
[dbo].[StringConcatAggregate](Comments) AS AllComments
FROM [dbo].[Comments]
GROUP BY CustomerCode
Here's a cursor-based solution that guarantees the order of the comments by row_num
. (See my other answer for how the [dbo].[Comments]
table was populated.)
SET NOCOUNT ON
DECLARE cur CURSOR LOCAL FAST_FORWARD FOR
SELECT
CustomerCode,
Comments
FROM [dbo].[Comments]
ORDER BY
CustomerCode,
RowNum
DECLARE @curCustomerCode int
DECLARE @lastCustomerCode int
DECLARE @curComment nvarchar(25)
DECLARE @comments nvarchar(MAX)
DECLARE @results table
(
CustomerCode int NOT NULL,
AllComments nvarchar(MAX) NOT NULL
)
OPEN cur
FETCH NEXT FROM cur INTO
@curCustomerCode, @curComment
SET @lastCustomerCode = @curCustomerCode
WHILE @@FETCH_STATUS = 0
BEGIN
IF (@lastCustomerCode != @curCustomerCode)
BEGIN
INSERT INTO @results(CustomerCode, AllComments)
VALUES(@lastCustomerCode, @comments)
SET @lastCustomerCode = @curCustomerCode
SET @comments = NULL
END
IF (@comments IS NULL)
SET @comments = @curComment
ELSE
SET @comments = @comments + N' ' + @curComment
FETCH NEXT FROM cur INTO
@curCustomerCode, @curComment
END
IF (@comments IS NOT NULL)
BEGIN
INSERT INTO @results(CustomerCode, AllComments)
VALUES(@curCustomerCode, @comments)
END
CLOSE cur
DEALLOCATE cur
SELECT * FROM @results
-- solution avoiding the cursor ...
DECLARE @idMax INT
DECLARE @idCtr INT
DECLARE @comment VARCHAR(150)
SELECT @idMax = MAX(id)
FROM [dbo].[CustomerCodeWithSeparateComments]
IF @idMax = 0
return
DECLARE @OriginalTable AS Table
(
[id] [int] NOT NULL,
[row_num] [int] NULL,
[customer_code] [varchar](50) NULL,
[comment] [varchar](120) NULL
)
DECLARE @FinalTable AS Table
(
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_code] [varchar](50) NULL,
[comment] [varchar](120) NULL
)
INSERT INTO @FinalTable
([customer_code])
SELECT [customer_code]
FROM [dbo].[CustomerCodeWithSeparateComments]
GROUP BY [customer_code]
INSERT INTO @OriginalTable
([id]
,[row_num]
,[customer_code]
,[comment])
SELECT [id]
,[row_num]
,[customer_code]
,[comment]
FROM [dbo].[CustomerCodeWithSeparateComments]
ORDER BY id, row_num
SET @idCtr = 1
SET @comment = ''
WHILE @idCtr < @idMax
BEGIN
SELECT @comment = @comment + ' ' + comment
FROM @OriginalTable
WHERE id = @idCtr
UPDATE @FinalTable
SET [comment] = @comment
WHERE [id] = @idCtr
SET @idCtr = @idCtr + 1
SET @comment = ''
END
SELECT @comment = @comment + ' ' + comment
FROM @OriginalTable
WHERE id = @idCtr
UPDATE @FinalTable
SET [comment] = @comment
WHERE [id] = @idCtr
SELECT *
FROM @FinalTable
-
2You haven't avoided a cursor. You've just called your cursor a while loop instead.Aaron Bertrand– Aaron Bertrand2016年02月12日 14:34:45 +00:00Commented Feb 12, 2016 at 14:34