SQL Row concatenation with XML PATH and STUFF giving aggregate sql error

Question 1

I am trying to query two tables and get results like the following:

Section Names
shoes AccountName1, AccountName2, AccountName3
books AccountName1

The tables are:

CREATE TABLE dbo.TableA(ID INT, Section varchar(64), AccountId varchar(64));
INSERT dbo.TableA(ID, Section, AccountId) VALUES
(1 ,'shoes','A1'),
(2 ,'shoes','A2'),
(3 ,'shoes','A3'),
(4 ,'books','A1');
CREATE TABLE dbo.TableB(AccountId varchar(20), Name varchar(64));
INSERT dbo.TableB(AccountId, Name) VALUES
('A1','AccountName1'),
('A2','AccountName2'),
('A3','AccountNAme3');

I saw a few questions answered saying to use "XML PATH" and "STUFF" to query the data to get the results I am looking for, but I think there is something missing. I have tried the below query and get the error message:

I don't have it in the SELECT clause of either query, but I assume the error is because AccountId is not unique in TableA.

Here is the query I am currently trying to get working correctly.

SELECT section, names= STUFF((
 SELECT ', ' + Name FROM TableB as b 
WHERE AccountId = b.AccountId
FOR XML PATH('')), 1, 1, '')
FROM TableA AS a
GROUP BY a.section

Question 2

Sorry, I missed a step in the relationship. Try this version (though Martin's will work as well):

SELECT DISTINCT o.section, names= STUFF((
 SELECT ', ' + b.Name 
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = o.Section
 FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o;

An approach that is at least as good, but sometimes better, is switching from DISTINCT to GROUP BY:

SELECT o.section, names= STUFF((
 SELECT ', ' + b.Name 
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = o.Section
 FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o
GROUP BY o.section;

At a high level, the reason DISTINCT applies to the entire column list. Therefore for any duplicates it has to perform the aggregate work for every duplicate before applying DISTINCT. If you use GROUP BY then it can potentially remove duplicates before doing any of the aggregation work. This behavior can vary by plan depending on a variety of factors including indexes, plan strategy, etc. And a direct switch to GROUP BY may not be possible in all cases.

In any case, I ran both of these variations in SentryOne Plan Explorer. The plans are different in a few minor, uninteresting ways, but the I/O involved with the underlying worktable is telling. Here is DISTINCT:

And here is GROUP BY:

When I made the tables larger (14,000+ rows mapping to 24 potential values), this difference is more pronounced. DISTINCT:

GROUP BY:

In SQL Server 2017, you can use STRING_AGG:

SELECT a.section, STRING_AGG(b.Name, ', ')
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = a.Section
 GROUP BY a.section;

The I/O here is almost nothing:

But, if you're not on SQL Server 2017 (or Azure SQL Database), and can't use STRING_AGG, I have to give credit where credit is due... Paul White's answer below has very little I/O and kicks the pants off of both of the FOR XML PATH solutions above.

Tables

DECLARE @TableA AS table
(
 ID integer PRIMARY KEY,
 Section varchar(10) NOT NULL,
 AccountID char(2) NOT NULL
);
DECLARE @TableB AS table
(
 AccountID char(2) PRIMARY KEY,
 Name varchar(20) NOT NULL
);

Data

INSERT @TableA
 (ID, Section, AccountID)
VALUES
 (1, 'shoes', 'A1'),
 (2, 'shoes', 'A2'),
 (3, 'shoes', 'A3'),
 (4, 'books', 'A1');
INSERT @TableB
 (AccountID, Name)
VALUES
 ('A1', 'AccountName1'),
 ('A2', 'AccountName2'),
 ('A3', 'AccountName3');

Join and convert to XML

DECLARE @x xml =
(
 SELECT
 TA.Section,
 CA.Name
 FROM @TableA AS TA
 JOIN @TableB AS TB
 ON TB.AccountID = TA.AccountID
 CROSS APPLY
 (
 VALUES(',' + TB.Name)
 ) AS CA (Name)
 ORDER BY TA.Section
 FOR XML AUTO, TYPE, ELEMENTS, ROOT ('Root')
);

XML creation query

The XML in the variable looks like this:

<Root>
 <TA>
 <Section>shoes</Section>
 <CA>
 <Name>,AccountName1</Name>
 </CA>
 <CA>
 <Name>,AccountName2</Name>
 </CA>
 <CA>
 <Name>,AccountName3</Name>
 </CA>
 </TA>
 <TA>
 <Section>books</Section>
 <CA>
 <Name>,AccountName1</Name>
 </CA>
 </TA>
</Root>

Query

The final query shreds the XML into sections, and concatenates the names in each:

SELECT
 Section = 
 N.n.value('(./Section/text())[1]', 'varchar(10)'),
 Names = 
 STUFF
 (
 -- Consecutive text nodes collapse
 N.n.query('./CA/Name/text()')
 .value('./text()[1]', 'varchar(8000)'), 
 1, 1, ''
 )
-- Shred per section
FROM @x.nodes('Root/TA') AS N (n);

Result

Output

Execution plan

score 16 · Accepted Answer · 2016-04-30 03:55:02Z

Sorry, I missed a step in the relationship. Try this version (though Martin's will work as well):

SELECT DISTINCT o.section, names= STUFF((
 SELECT ', ' + b.Name 
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = o.Section
 FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o;

An approach that is at least as good, but sometimes better, is switching from DISTINCT to GROUP BY:

SELECT o.section, names= STUFF((
 SELECT ', ' + b.Name 
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = o.Section
 FOR XML PATH, TYPE).value(N'.[1]', N'varchar(max)'), 1, 2, '')
FROM dbo.TableA AS o
GROUP BY o.section;

At a high level, the reason DISTINCT applies to the entire column list. Therefore for any duplicates it has to perform the aggregate work for every duplicate before applying DISTINCT. If you use GROUP BY then it can potentially remove duplicates before doing any of the aggregation work. This behavior can vary by plan depending on a variety of factors including indexes, plan strategy, etc. And a direct switch to GROUP BY may not be possible in all cases.

In any case, I ran both of these variations in SentryOne Plan Explorer. The plans are different in a few minor, uninteresting ways, but the I/O involved with the underlying worktable is telling. Here is DISTINCT:

And here is GROUP BY:

When I made the tables larger (14,000+ rows mapping to 24 potential values), this difference is more pronounced. DISTINCT:

GROUP BY:

In SQL Server 2017, you can use STRING_AGG:

SELECT a.section, STRING_AGG(b.Name, ', ')
 FROM dbo.TableA AS a
 INNER JOIN dbo.TableB AS b
 ON a.AccountId = b.AccountId
 WHERE a.Section = a.Section
 GROUP BY a.section;

The I/O here is almost nothing:

But, if you're not on SQL Server 2017 (or Azure SQL Database), and can't use STRING_AGG, I have to give credit where credit is due... Paul White's answer below has very little I/O and kicks the pants off of both of the FOR XML PATH solutions above.

Stack Exchange Network

SQL Row concatenation with XML PATH and STUFF giving aggregate sql error

2 Answers 2

Tables

Data

Join and convert to XML

Query

Result

Execution plan

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

SQL Row concatenation with XML PATH and STUFF giving aggregate sql error

2 Answers 2

Tables

Data

Join and convert to XML

Query

Result

Execution plan

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions