7

Consider the following product table (which is highly trimmed down):

`id` int AUTO_INCREMENT
`category_id` int
`subcategory_id` int
`vendor_id` int
`price` decimal(6,2)
`inserted_at` timestamp

For a given category ID, I am attempting to retrieve a list containing the vendor with the lowest latest price for each subcategory. With "latest" I mean that vendors may have multiple prices for a given category ID/subcategory ID combination, so only the most recently inserted price for that category ID/subcategory ID/vendor ID should be used. If there's a tie between 2 or more vendor's prices, the lowest id should be used as the tie-breaker.

For example, with this data:

id | category_id | subcategory_id | vendor_id | price | inserted_at
---------------------------------------------------------------------------
1 | 1 | 2 | 3 | 16.00 | 2015年07月23日 04:00:00 
2 | 1 | 1 | 2 | 9.00 | 2015年07月26日 08:00:00 
3 | 1 | 2 | 4 | 16.00 | 2015年08月02日 10:00:00 
4 | 1 | 1 | 1 | 7.00 | 2015年08月04日 11:00:00
5 | 1 | 1 | 1 | 11.00 | 2015年08月09日 16:00:00

So, first find the most recent prices for every subcategory/vendor combination (row with price=7.00 would be removed because it's not the most recent for that vendor in that subcategory). Then for subcategory 1 the lowest price would be 9 (so vendor_id = 2) and for subcategory 2 the lowest price is 16 (two vendors tie ()ids 3 and 4) so we choose the one with lowest vendor_id = 3).

I would expect the following results for category_id = 1:

subcategory_id | vendor_id | price
----------------------------------
1 | 2 | 9.00
2 | 3 | 16.00

Here's what I have so far. I feel like it's already starting to get out of hand and this doesn't even account for ties between 2 or more vendor's prices.

SELECT c.subcategory_id, c.vendor_id, c.price
FROM products AS c
JOIN
(
 SELECT MIN(a.price) AS min_price,
 a.subcategory_id
 FROM products AS a
 JOIN
 (
 SELECT MAX(`inserted_at`) AS latest_price_time,
 vendor_id,
 subcategory_id
 FROM products
 WHERE category_id = 1
 GROUP BY vendor_id, subcategory_id
 ) AS b
 ON a.inserted_at = b.latest_price_time AND a.vendor_id = b.vendor_id AND a.subcategory_id = b.subcategory_id
 WHERE a.category_id = 1
 GROUP BY a.subcategory_id
) AS d
ON c.price = d.min_price AND c.subcategory_id = d.subcategory_id
WHERE c.category_id = 1

Before I go any further, I wanted to see if there was an easier way. When it comes to grouping/aggregating results of additional groupings/aggregations, is there a method that will give me the best performance (most important) and/or be easier to read (less important)?

Paul White
95.3k30 gold badges439 silver badges689 bronze badges
asked Aug 17, 2015 at 21:54
0

3 Answers 3

6

This is a "greatest-n-per-group" query and it's quite complicated to write in MySQL - first due to lack of window functions and second because you have 2 greatest-n-per-group specifications, first for the latest date per vendor and second for the lowest price per subcategory.

Here's one, rather complex way to write this:

SELECT 
 ps.subcategory_id, ps.vendor_id, ps.price -- , p.inserted_at
FROM
 ( SELECT DISTINCT subcategory_id
 FROM product
 WHERE category_id = 1
 ) AS s
 JOIN
 product AS ps
 ON ps.category_id = 1
 AND ps.subcategory_id = s.subcategory_id
 AND ps.id =
 ( SELECT psv.id
 FROM
 ( SELECT DISTINCT subcategory_id, vendor_id
 FROM product
 WHERE category_id = 1
 ) AS sv
 JOIN
 product AS psv
 ON psv.category_id = 1
 AND psv.subcategory_id = sv.subcategory_id
 AND psv.vendor_id = sv.vendor_id
 AND psv.inserted_at =
 ( SELECT pi.inserted_at
 FROM product AS pi
 WHERE pi.category_id = 1
 AND pi.subcategory_id = sv.subcategory_id
 AND pi.vendor_id = sv.vendor_id
 ORDER BY pi.inserted_at DESC
 LIMIT 1
 )
 WHERE sv.subcategory_id = s.subcategory_id
 ORDER BY psv.price,
 psv.vendor_id
 LIMIT 1
 ) ;

Tested in SQLfiddle-2. Not too bad of a plan either, with a proper index on (category_id, subcategory_id, vendor_id, inserted_at).

It may not be the most efficient and I would definitely experiment with indexes (see the Fiddle, where I have one more index. It may not be very useful but test t on bigger tables.)

(the first version of the query in SQLfidle-1)

answered Aug 18, 2015 at 0:07
2
  • Thanks for the response. I did quite a bit of research on greatest-n-per-group but I couldn't find much on nested versions. As my query got to a few layers deep, I decided to come here in hopes of finding some feature of which I wasn't aware. Turns out I was along the right track, I suppose. Thanks again! Commented Aug 18, 2015 at 1:31
  • I edited the query a bit. Both versions should produce the same result. Commented Aug 18, 2015 at 1:35
6

This should work:

SELECT
 d.subcategory_id,
 d.vendor_id,
 MIN(d.price) AS price,
 d.inserted_at
FROM product AS d
JOIN (SELECT
 b.category_id,
 b.subcategory_id,
 b.vendor_id,
 a.last_iat
 FROM product AS b 
 JOIN (SELECT
 a.category_id,
 a.subcategory_id,
 a.vendor_id,
 a.price,
 MAX(a.inserted_at) AS last_iat
 FROM product AS a
 GROUP BY a.category_id,a.subcategory_id,a.vendor_id
 ) AS a
 ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
 GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
 ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
WHERE d.category_id=1
GROUP BY d.category_id,d.subcategory_id;

Test:

mysql> SELECT
 -> d.subcategory_id,
 -> d.vendor_id,
 -> MIN(d.price) AS price,
 -> d.inserted_at
 -> FROM product AS d
 -> JOIN (SELECT
 -> b.category_id,
 -> b.subcategory_id,
 -> b.vendor_id,
 -> a.last_iat
 -> FROM product AS b 
 -> JOIN (SELECT
 -> a.category_id,
 -> a.subcategory_id,
 -> a.vendor_id,
 -> a.price,
 -> MAX(a.inserted_at) AS last_iat
 -> FROM product AS a
 -> GROUP BY a.category_id,a.subcategory_id,a.vendor_id
 -> ) AS a
 -> ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id)
 -> GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c
 -> ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at)
 -> WHERE d.category_id=1
 -> GROUP BY d.category_id,d.subcategory_id;
+----------------+-----------+-------+---------------------+
| subcategory_id | vendor_id | price | inserted_at |
+----------------+-----------+-------+---------------------+
| 1 | 2 | 9.00 | 2015年07月26日 08:00:00 |
| 2 | 3 | 16.00 | 2015年07月23日 04:00:00 |
+----------------+-----------+-------+---------------------+
2 rows in set (0.00 sec)
mysql> 

Explain:

I used the index recommendation from @ypercube.

mysql> EXPLAIN SELECT d.subcategory_id, d.vendor_id, MIN(d.price) AS price, d.inserted_at FROM product AS d JOIN (SELECT b.category_id, b.subcategory_id, b.vendor_id, a.last_iat FROM product AS b JOIN (SELECT a.category_id, a.subcategory_id, a.vendor_id, a.price, MAX(a.inserted_at) AS last_iat FROM product AS a GROUP BY a.category_id,a.subcategory_id,a.vendor_id ) AS a ON (a.category_id=b.category_id AND a.subcategory_id=b.subcategory_id AND a.vendor_id=b.vendor_id) GROUP BY b.category_id,b.subcategory_id,b.vendor_id) AS c ON (c.category_id=d.category_id AND c.subcategory_id=d.subcategory_id AND c.last_iat=d.inserted_at) WHERE d.category_id=1 GROUP BY d.category_id,d.subcategory_id;
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | d | ALL | q_ix | NULL | NULL | NULL | 5 | Using where; Using join buffer |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 4 | Using temporary; Using filesort |
| 2 | DERIVED | b | ref | q_ix | q_ix | 15 | a.category_id,a.subcategory_id,a.vendor_id | 1 | Using where; Using index |
| 3 | DERIVED | a | index | NULL | q_ix | 19 | NULL | 5 | |
+----+-------------+------------+-------+---------------+------+---------+--------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
mysql> 
answered Aug 17, 2015 at 22:12
1
  • I appreciate the response. While yours is actually easier to understand, I chose ypercube's answer as the correct one since his was more "technically" correct. I needed to be able to specify the category ID. I know it would be very easy to modify yours, but I can only choose 1 answer anyway. Thanks! Commented Aug 18, 2015 at 1:37
0

I tried to run the one by @oNare but I think it is not robust because if I change min to max, it outputs the wrong vendor.

My approach is to join two tables, table_date and table_price. The table_date is grouped by latest dates, and table_price is based on pricing that has been derived from a table similar to table_date.

SELECT subcategory_id, MIN(vendor_id) AS vendor_id, price
FROM (
 SELECT table_date.subcategory_id, table_date.vendor_id, price, table_date.latest_date
 FROM (SELECT b.subcategory_id, a.vendor_id, price, a.latest_date FROM product AS b
 JOIN (SELECT subcategory_id, vendor_id, MAX(inserted_at) AS latest_date
 FROM product AS a
 WHERE category_id=1
 GROUP BY category_id, subcategory_id, vendor_id
 ) AS a
 ON b.subcategory_id=a.subcategory_id AND b.inserted_at=a.latest_date
 WHERE b.category_id=1
 ) AS table_date
 JOIN (SELECT subcategory_id, MIN_price 
 FROM (SELECT b.subcategory_id, a.vendor_id, MIN(price) AS MIN_price, a.latest_date FROM product AS b
 JOIN (SELECT subcategory_id, vendor_id, MAX(inserted_at) AS latest_date
 FROM product AS a
 WHERE category_id=1
 GROUP BY category_id, subcategory_id, vendor_id
 ) AS a
 ON b.subcategory_id=a.subcategory_id AND b.inserted_at=a.latest_date
 WHERE b.category_id=1
 GROUP BY b.subcategory_id
 ) AS c
 ) AS table_price
 ON table_date.subcategory_id = table_price.subcategory_id AND table_date.price = table_price.MIN_price) AS final
GROUP BY subcategory_id;
answered May 26, 2022 at 8:48

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.