I have a table where each record has a one to many relationship with 3 other tables (with further one to one branching) leading to many rows for each main record with many columns of duplicate information. In PHP
, I take the result set and flatten it to a multidimensional array.
I am weighing the benefits of rewriting the query to let MySQL
do the flattening using GROUP_CONCAT
statements. I'd end up with one row per main record with 3 fields of concatenated data (files
, grades + pages
, and categories
). I am not using any GROUP BY
statements; I'm only using GROUP_CONCAT
to flatten.
I've done this before for a single GROUP_CONCAT
but am curious if this a "normal" use of the technology. I am asking from a design standards and maintainability point of view or if there are any gotchas I'm overlooking. Is it personal preference? Performance appears to be about the same.
As I see it from a programming standpoint
Benefits of GROUP_CONCAT
:
- no duplicated data to send across the internet
- simplified processing in
PHP
: even though I have to massage the data afterwards usingexplode()
, it seems less obtuse than the code I have to step through, compiling the distinct values offile
,grade + page
, andcategory
for each record - the query actually appears to better represent what is happening by putting the many joins in context
Downsides:
- There are multiple columns being combined within the
GROUP_CONCAT
output, so complexity is added with delimiters and nestedexplode()
statements needed inPHP
to separate out the fields. - If it's not broken... I've been using the code without
GROUP_CONCAT
for many years. A pain to change, but I get there eventually.
The query below is much simplified. The reason for the nested query is a calculation subquery I've removed.
Query without GROUP_CONCAT
SELECT
g.gemid,
g.title,
gd.filename,
gd.license,
gp.grade,
gp.page,
gp.page2,
gc.category,
mg.topid,
mg.title AS gradetitle,
mp.license AS pagelicense,
mp2.license AS page2license,
mp.title AS pagetitle,
mp2.title AS page2title
FROM (
SELECT DISTINCT
gems.gemid,
gems.title,
gp.sort
FROM
gems
LEFT JOIN gempage gp ON gems.gemid = gp.gemid
WHERE gp.grade = 1
ORDER BY gp.sort
) g
LEFT JOIN gempage gp ON g.gemid = gp.gemid
LEFT JOIN mgrade mg ON gp.grade = mg.name
LEFT JOIN mpage mp ON gp.page = mp.name AND mg.gradeid = mp.gradeid
LEFT JOIN mpage2 mp2 ON gp.page2 = mp2.name AND mp.pageid = mp2.pageid AND mg.gradeid = mp.gradeid
LEFT JOIN gemcategory gc ON g.gemid = gc.gemid
LEFT JOIN gemdetail gd ON g.gemid = gd.gemid
WHERE gp.grade = 1
ORDER BY gp.sort
Query with GROUP_CONCAT
SELECT
(SELECT GROUP_CONCAT(CONCAT_WS(":",IFNULL(filename,''), IFNULL(license,''))) FROM gemdetail gd WHERE g.gemid = gd.gemid) as filelist,
(SELECT GROUP_CONCAT(category ORDER BY sort, gemcategoryid SEPARATOR ', ') FROM gemcategory gc WHERE gc.gemid = g.gemid) as catlist,
(SELECT DISTINCT GROUP_CONCAT(CONCAT_WS(",", gp.grade, gp.page, IFNULL(gp.page2,''), mg.topid, IFNULL(mg.title,''), IFNULL(mp.license,''), IFNULL(mp.title,''), IFNULL(mp2.license,''), IFNULL(mp2.title,'')))
FROM gempage gp
LEFT JOIN mgrade mg ON gp.grade = mg.name
LEFT JOIN mpage mp ON gp.page = mp.name AND mg.gradeid = mp.gradeid
LEFT JOIN mpage2 mp2 ON gp.page2 = mp2.name AND mp.pageid = mp2.pageid AND mg.gradeid = mp.gradeid
WHERE g.gemid = gp.gemid AND gp.grade = 1) as gradepage,
g.gemid,
g.title
FROM (
SELECT DISTINCT
gems.gemid,
gems.title,
gp.sort
FROM
gems
LEFT JOIN gempage gp ON gems.gemid = gp.gemid
WHERE gp.grade = 1
ORDER BY gp.sort
) g
1 Answer 1
- Can't use
,
in any of the values. - You may need to set a larger value for the Variable
group_concat_max_len
. - You may need
DISTINCT
insideGROUP_CONCAT()
. FROM ( SELECT ... ORDER BY )
-- TheORDER BY
will be ignored and should be removed. You may desire theORDER BY
on the outside.
These indexes may speed it up:
g: INDEX(gemid, title)
gd: INDEX(gemid, filename, license)
gc: INDEX(gemid, category)
gp: INDEX(grade, gemid, sort)
gp: INDEX(grade, gemid, page, page2)
mg: INDEX(name, topid, title, gradeid)
mp: INDEX(name, license, title, gradeid, pageid)
mp2: INDEX(name, license, title, pageid)
gems: INDEX(gemid, title)
-
Yes, thank you for mentioning not being able to use the chosen delimiters in the data. I forgot to mention that I have control over the field data. I also appreciate the list of indexes. It hadn't occurred to me that all fields being concatenated would benefit from indexes. Why is this? I thought that only fields which are part of a WHERE, ORDER BY, or GROUP BY would benefit from indexes.mseifert– mseifert2022年12月08日 21:24:16 +00:00Commented Dec 8, 2022 at 21:24
-
@mseifert - Search for "covering index".Rick James– Rick James2022年12月08日 23:54:10 +00:00Commented Dec 8, 2022 at 23:54
-
Thank you! That was a huge gap in my DB knowledge. Makes lots of sense after reading about it. A bit humbling that I wasn't aware of this really, after 30 years in the industry - on and off and now designing again. Thanks for taking the time to answer thoroughly - really. I may or may not switch to GROUP_CONCAT, but the indexing will help tremendously regardless.mseifert– mseifert2022年12月09日 00:26:02 +00:00Commented Dec 9, 2022 at 0:26
-
@mseifert - Yeah, it took me years to digest indexing enough to write this Index CookbookRick James– Rick James2022年12月09日 01:11:09 +00:00Commented Dec 9, 2022 at 1:11