How to GROUP_CONCAT DISTINCT values in a MySQL query that gets number of records and min/max values?
I'm running MySQL 5.0.88 (Coldfusion8)
I have a product search which I'm querying number-of-results
as well as min/max prices/rebates
across the product table. I also want to include a string of distinct sizes/colors
, so I can update my search criteria along with displaying the results.
However my GROUP_CONCAT does not return all expected values. I don't know what I'm missing, but it seems it is only returning Distinct min/max
sizes/colors versus returning all
Distinct sizes/colors across the recordset.
My product table:
CREATE TABLE dummy (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`iln` VARCHAR(13) NULL DEFAULT NULL,
`ean` VARCHAR(35) NULL DEFAULT NULL,
`artikelnummer` VARCHAR(35) NULL DEFAULT NULL,
`groesse` VARCHAR(10) NULL DEFAULT NULL,
`farbe` VARCHAR(35) NULL DEFAULT NULL,
`farbnummer` VARCHAR(10) NULL DEFAULT NULL,
`preis_ek` DECIMAL(12,2) NULL DEFAULT NULL,
`preis_vk` DECIMAL(12,2) NULL DEFAULT NULL,
`preis_aktuell` DECIMAL(12,2) NULL DEFAULT NULL,
`firma` VARCHAR(35) NULL DEFAULT NULL,
`marke` VARCHAR(35) NULL DEFAULT NULL,
`nos` VARCHAR(4) NULL DEFAULT NULL,
`nos_anzeige` VARCHAR(4) NULL DEFAULT NULL,
`aktiv` VARCHAR(4) NULL DEFAULT NULL,
`modus` VARCHAR(4) NULL DEFAULT NULL,
`bestand` DECIMAL(10,0) NULL DEFAULT '0'
)
Products are listed by EAN, so for example a Shirt in Size S,M,L,XL would have 4 entries like so:
style ean size price qty
123 111111111111 S 9.99 12
123 111111111112 M 9.99 1
123 111111111113 L 9.99 23
123 111111111114 XL 9.99 0
Here is my Query:
SELECT COUNT(recordcount) AS total_records
, MIN(min_price_ek) AS ek_min
, MAX(max_price_ek) AS ek_max
, MIN(min_price_vk) AS vk_min
, MAX(max_price_vk) AS vk_max
, MAX(max_reb) AS rb_max
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT sizeRange ), ',', 10 ) AS sz_rng
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT colorRange ), ',', 16 ) AS cl_rng
FROM (SELECT a.id AS recordcount
, a.nos
, a.nos_anzeige
, MAX(IFNULL(p.ek, a.preis_ek)) AS max_price_ek
, MIN(IFNULL(p.ek, a.preis_ek)) AS min_price_ek
, MAX(IFNULL(p.vk, a.preis_vk)) AS max_price_vk
, MIN(IFNULL(p.vk, a.preis_vk)) AS min_price_vk
, ROUND( MAX( ( IFNULL(p.ek, a.preis_ek) - IFNULL(p.vk ,a.preis_aktuell) ) / IFNULL(p.ek, a.preis_ek) ),2) AS max_reb
, a.groesse AS sizeRange
, zu.systemfarbe AS colorRange
FROM artikelstammdaten a
# currency join
LEFT JOIN preislisten p ON
p.iln = a.iln
AND p.ean = a.ean
AND ( ( p.preisliste = "Test" AND p.iln = "2222222222222" ) OR (1=0) )
# base color join
LEFT JOIN farbenzuordnung zu ON
a.farbe = zu.farbe
WHERE a.aktiv = "ja"
# include currency if applicable
AND ( IF( a.iln IN ( 2222222222222), p.onlinepreis IS NOT NULL,1 ) )
AND a.artikelnummer LIKE "%style_number%"
GROUP BY a.iln, a.artikelnummer
HAVING (( sum(a.bestand) != 0 ) OR (a.nos = "ja" AND a.nos_anzeige = "ja" ))
) AS temp
I need to group by seller (a.iln
) and style (a.artikelnummer
). The having
clause only selects products that have inventory (sum(a.bestand) != 0
) or are available never-out-of-stock (a.nos
set on product and a.nos_anzeige
feature active).
My problem is selecting distinct sizes/colors done in these lines:
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT sizeRange ), ',', 10 ) AS sz_rng
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT colorRange ), ',', 10 ) AS cl_rng
....
, a.groesse AS sizeRange
, zu.systemfarbe AS colorRange
This does not work properly. The above article would return only s,xl
vs s,m,l,xl
and I'm not really sure why. I'm using the SUBSTRING_INDEX
to only select the first 10 sizes/colors (TOP 10 would be nicer), but I'm not sure this is the problem.
Question:
Is there something wrong in the syntax which causes not all distinct sizes/colors to be returned? Is there a way to only retur
2 Answers 2
UPDATE
Now I see your error. The inner query uses aggregation, and takes out sizeRange
which is not a column you aggregate on. So you only get "samples" of that column. Strictly speaking, your query is not valid SQL but MySQL allows it given a relaxed sql_mode
.
So your query is inherently erroneous. Will see if I can help fix it.
ORIGINAL answer
My guess for you would be to check the value of group_concat_max_len
.
It is by default just 1024
, though you typically don't really want a limit for that.
The problem might be that you are only getting partial results, where, by chance or by order of evaluation, "S" and "XL" occupy first 1024 characters or more. I see no reason why "M" or "L" would not be there -- the GROUP_CONCAT
doesn't do such distinctions.
So, try out:
SET group_concat_max_len := 1000000;
And execute your query again. If this works, make sure to set said param in you MySQL configuration file.
You may find my related post useful.
-
Thanks for the input so far. Good idea with length, but I tested running the query on a single article, which should only return: "s,m,l,xl" and I get "s,xl". So you are probably correct about the
inherent error
... :-)frequent– frequent2012年08月30日 12:24:58 +00:00Commented Aug 30, 2012 at 12:24 -
Fixed it. See my answer. Thanks for helping out!frequent– frequent2012年09月06日 20:18:10 +00:00Commented Sep 6, 2012 at 20:18
Ok. After playing around with this for a while and keeping @Shlomi Noach comment in mind, I came up with the following:
- also group by
a.groesse
andzu.systemfarbe
- this throws off the
recordcount
(seems I'm now no longer grouping by style number (a.artikelnummer
) - remove the
COUNT(recordcount) AS total_records
and useCOUNT(DISTINCT styles) AS total_records
instead witha.artikelnummer AS styles
in the subquery.
So my query now looks like this:
SELECT
COUNT(DISTINCT styles) AS total_records
, MIN(min_price_ek) AS ek_min
, MAX(max_price_ek) AS ek_max
, MIN(min_price_vk) AS vk_min
, MAX(max_price_vk) AS vk_max
, MAX(max_reb) AS rb_max
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT sizeRange ), ',', 10 ) AS sz_rng
, SUBSTRING_INDEX( GROUP_CONCAT( DISTINCT colorRange ), ',', 16 ) AS cl_rng
FROM (SELECT
a.artikelnummer AS styles
, a.nos
, a.nos_anzeige
, MAX(IFNULL(p.ek, a.preis_ek)) AS max_price_ek
, MIN(IFNULL(p.ek, a.preis_ek)) AS min_price_ek
, MAX(IFNULL(p.vk, a.preis_vk)) AS max_price_vk
, MIN(IFNULL(p.vk, a.preis_vk)) AS min_price_vk
, ROUND( MAX( ( IFNULL(p.ek, a.preis_ek) - IFNULL(p.vk ,a.preis_aktuell) ) / IFNULL(p.ek, a.preis_ek) ),2) AS max_reb
, a.groesse AS sizeRange
, zu.systemfarbe AS colorRange
...
GROUP BY a.iln, a.artikelnummera.groesse, zu.systemfarbe
HAVING (( sum(a.bestand) != 0 ) OR (a.nos = "ja" AND a.nos_anzeige = "ja" ))
) AS temp
Seems to work and the query processing time also is about the same. I'm open for improvement suggestions tough if this can be done any other/easier way.
Thanks!