The current pandemic is making it very difficult to contact my teachers for assistance, so I'm hoping I can describe my problem well enough to try getting some help here.
The database used is the sakila example database
(I tried to create a fiddle, but the tables are just too big)
My data will be grouped by each store (only 2 video stores), and I will be getting
- The total sales of each store (teachers hint: store table has inventory. inventory can be rented, rental table has payments.
- count of customers per store
- inventory count per store
- count of movies per store
- count of movies rented that are not yet returned
Here is my incorrect output:
--------------------------------------------------------------------------------------------------------------------------------
| Jordan_Rasmussen | store_id | total_sales | num_customers | count_inventory | count_titles | inventory_cost | num_rentals_out |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 1 | 68359569.18 | 326 | 2270 | 759 | 952923.30 | 29992 |
----------------------------------------------------------------------------------------------------------------------------
| (date&time) | 2 | 56966647.92 | 273 | 2311 | 762 | 970134.69 | 24843 | |
--------------------------------------------------------------------------------------------------------------------------------
Here is the correct output:
--------------------------------------------------------------------------------------------------------------------------------
| First_and_last N | store_id | total_sales | num_customers | count_inventory | count_titles | inventory_cost | num_rentals_out |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 1 | 209691.93 | 326 | 2270 | 759 | 46205.30 | 92 |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 2 | 208669.04 | 273 | 2311 | 762 | 46415.89 | 91 |
--------------------------------------------------------------------------------------------------------------------------------
Here is my code for trying to get the above output:
SELECT NOW() AS 'Jordan_Rasmussen',
s.store_id,
-- Get the total sales
SUM(p.amount) AS total_sales,
-- Get the number of customers
COUNT(DISTINCT c.customer_id) AS num_customers,
-- Get the inventory count
COUNT(DISTINCT i.inventory_id) AS inventory_count,
-- Get the number of movie titles
COUNT(DISTINCT f.title) AS num_titles,
-- Get the inventory value
SUM(DISTINCT f.replacement_cost) * COUNT(DISTINCT i.inventory_id) AS inventory_value,
-- Get the number of movies rented that have not yet been returned
COUNT(r.rental_date) AS num_rentals_out
FROM store AS s
LEFT JOIN inventory AS i ON s.store_id = i.store_id
LEFT JOIN customer AS c ON s.store_id = c.store_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
INNER JOIN payment AS p ON r.rental_id = p.rental_id
INNER JOIN film AS f ON i.film_id = f.film_id
GROUP BY store_id;
I can get the correct results by themselves, but I notice the more I JOIN tables together, the more the results change. I've been scratching my head about this for awhile now, but I'm just not sure what it is I'm missing.
Should I be using subqueries? Or am I just joining improperly?
Sorry for such a big question, but I'm clueless right now.
-
Take a look here it explains how joining works and especially when to use aggregate functionsnbk– nbk2020年03月29日 01:49:49 +00:00Commented Mar 29, 2020 at 1:49
-
Thanks @nbk , I'll check it out!Ras– Ras2020年03月29日 05:30:41 +00:00Commented Mar 29, 2020 at 5:30
1 Answer 1
The problem is you are joining unrelated tables. If you don't join them, as in
SELECT NOW() AS 'Dorjan Masrussen',
s.store_id,
(select sum(p.amount)
from payment p
where p.rental_id in (
select r.rental_id
from rental r
join inventory i
on i.inventory_id = r.inventory_id
where i.store_id = s.store_id)) total_sales,
(select count(*)
from customer c
where c.store_id = s.store_id) num_customers,
(select count(*)
from inventory i
where i.store_id = s.store_id) count_inventory,
(select count(distinct film_id)
from inventory i
where i.store_id = s.store_id) count_titles,
(select sum(f.replacement_cost)
from inventory i
join film f on f.film_id = i.film_id
where i.store_id = s.store_id) inventory_cost,
(select count(*)
from rental r
join inventory i
on i.inventory_id = r.inventory_id
where i.store_id = s.store_id) num_rentals_out
from store as s
you get:
+---------------------+----------+-------------+---------------+-----------------+--------------+----------------+-----------------+
| Dorjan Masrussen | store_id | total_sales | num_customers | count_inventory | count_titles | inventory_cost | num_rentals_out |
+---------------------+----------+-------------+---------------+-----------------+--------------+----------------+-----------------+
| 2020年03月31日 17:32:18 | 1 | 33679.79 | 326 | 2270 | 759 | 46205.30 | 7923 |
| 2020年03月31日 17:32:18 | 2 | 33726.77 | 273 | 2311 | 762 | 46415.89 | 8121 |
+---------------------+----------+-------------+---------------+-----------------+--------------+----------------+-----------------+
Total sales and num_rentals_out differ from your correct output. I wonder who is right. The Sakila structure diagram, if I'm not mistaken, has it's crow's feet inverted.
The same result can be obtained, while querying inventory only once, using a CTE, the Common Table Expression "WITH", as follows:
WITH idata as (
select i.store_id,
count(*) num,
count(distinct i.film_id) num_films,
sum(f.replacement_cost) repl_cost,
sum((select count(*)
from rental r
where r.inventory_id = i.inventory_id)) num_rentals,
sum((select sum(p.amount)
from payment p
where p.rental_id in (
select r.rental_id
from rental r
where r.inventory_id = i.inventory_id
))) sum_rentals
from inventory i
join film f on f.film_id = i.film_id
group by i.store_id
)
SELECT NOW() AS 'Dorjan Masrussen',
s.store_id,
idata.sum_rentals total_sales,
(select count(*)
from customer c
where c.store_id = s.store_id) num_customers,
idata.num count_inventory,
idata.num_films count_titles,
idata.repl_cost inventory_cost,
idata.num_rentals num_rentals_out
from store as s
join idata on idata.store_id = s.store_id
Querying the store table can also be dropped, since all we need is the store_id, which we also find in inventory.
-
Thank you very much for the response. I tried your code in my DB, and I received the correct output. I'm wondering if my instructors tweaked the sakila database around for the class. That being said though, You helped me understand what I was doing wrong. I took a closer look at some of the tables I was joining and found that for example: joining the rental table onto the inventory table actually messed up the results because inventory had the proper set of inventory IDs, when the rentals inventory was specifically the inventory ID of rented movies.Ras– Ras2020年03月31日 20:43:49 +00:00Commented Mar 31, 2020 at 20:43
-
also, the Dorjan Masrussen made me laugh, thanks for that too!Ras– Ras2020年03月31日 20:56:26 +00:00Commented Mar 31, 2020 at 20:56
-
My answer was done in a hurry, and is certainly not the most performant. See how many times inventory is queried for the same rows. But I wanted to show how to avoid joining unrelated data. I'll try to provide a better query tomorrow.Gerard H. Pille– Gerard H. Pille2020年03月31日 22:38:12 +00:00Commented Mar 31, 2020 at 22:38
-
I look forward to it! I appreciate the help, this whole pandemic is keeping me from sitting down with my teachers.Ras– Ras2020年03月31日 22:52:10 +00:00Commented Mar 31, 2020 at 22:52
-
The teachers can't afford internet?Gerard H. Pille– Gerard H. Pille2020年03月31日 22:59:27 +00:00Commented Mar 31, 2020 at 22:59