I'm using data from a database from the IMDB website. The database consists of five relevant tables.
- Actor (id, fname, lname, gender)
- Movie (id, name, year, rank)
- Director (id, fname, lname)
- Cast (pid, mid, role)
- Movie_Director (did, mid)
It's worth noting that the id
column in Actor, Movie & Director tables is a key for the respective table.
Cast.pid
refers to Actor.id
, and Cast.mid
refers to Movie.id
.
Here's the prompt and my initial attempt at solving it. Any help in improving/speeding up the query would be greatly appreciated.
/* For every pair of male and female actors that appear together in some film,
find the total number of films in which they appear together.
Sort the answers in decreasing order of the total number of films. */
SELECT a1.fname, a1.lname, a2.fname, a2.lname, count(*) AS num_films
FROM Actor AS a1, Actor AS a2, Cast AS c1, Cast AS c2
WHERE c1.mid = c2.mid AND
a1.id = c1.pid AND
a2.id = c2.pid AND
a1.gender = 'M' AND
a2.gender = 'F'
GROUP BY a1.id, a2.id, a1.fname, a1.lname, a2.fname, a2.lname
HAVING COUNT(*) > 0
ORDER BY COUNT(*) DESC;
1 Answer 1
Alias notation
Your table aliases a1/a2
and c1/c2
are not good. In this very short query it may not matter, however names for aliases, variables, parameters, etc. should be descriptive enough to say something about what they represent. Besides, with 4-5 character table names, is it really that much to type? I would use Actor1/Actor2
and Cast1/Cast2
.
In fact, given the nature of your query, you could give them even more useful names... how about MaleAct
/MaleCast
and FemAct
/FemCast
!
Old style JOIN
This JOIN
syntax should not be used:
FROM Actor AS a1, Actor AS a2, Cast AS c1, Cast AS c2
WHERE c1.mid = c2.mid AND
a1.id = c1.pid AND
a2.id = c2.pid AND
Instead:
FROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pid
Multiple counting
I see you COUNT(*)
3 times within your query. Since you're already selecting the COUNT(*)
into your result set, and MySQL will allow you to use column aliases in GROUP BY, ORDER BY, or HAVING clauses.
This:
HAVING COUNT(*) > 0
ORDER BY COUNT(*) DESC;
Instead:
HAVING num_films > 0
ORDER BY num_films DESC;
That should speed up your query a bit too.
Everything together
SELECT MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname, count(*) AS num_films
FROM Actor AS MaleAct
INNER JOIN Cast AS MaleCast ON MaleAct.id = MaleCast.pid
INNER JOIN Cast AS FemCast ON MaleCast.id = FemCast.id
INNER JOIN Actor AS FemAct ON FemAct.id = FemCast.pid
WHERE
MaleAct.gender = 'M' AND
FemAct.gender = 'F'
GROUP BY MaleAct.id, FemAct.id, MaleAct.fname, MaleAct.lname, FemAct.fname, FemAct.lname,
HAVING num_films > 0
ORDER BY num_films DESC;