Problem Statement:
I am working on this simple dataset from Kaggle. I have provided a snippet of data with only required columns in below table. Dataset is quite simple, it has all IPL (cricket) matches listed with teams who played each match (team1 and team2) along with winner of that match.
Now I am trying to get total matches played by all teams along with matches won by each team, I have again provided a snippet of output below the code. Same can be performed by "finding all occurrences of a particular team in column team1" + "finding all occurrences of a particular team in column team2".
While the code does give proper result, I can sense this is not the best approach. I would like to know some better way to do it along with good practices and naming conventions to follow.
Dataset:
team1 | team2 | winner |
---|---|---|
Royal Challengers Bangalore | Kolkata Knight Riders | Kolkata Knight Riders |
Kings XI Punjab | Chennai Super Kings | Chennai Super Kings |
Delhi Daredevils | Rajasthan Royals | Delhi Daredevils |
Mumbai Indians | Royal Challengers Bangalore | Royal Challengers Bangalore |
Kolkata Knight Riders | Deccan Chargers | Kolkata Knight Riders |
Rajasthan Royals | Kings XI Punjab | Rajasthan Royals |
Code:
SELECT t1.team1 AS team, c_t1 + c_t2 AS played, c_w AS won, CAST(c_w AS FLOAT) / (c_t1 + c_t2) * 100 AS won_percentage
FROM
(SELECT team1, count(team1) AS c_t1 FROM ipl_m GROUP BY team1) AS t1
JOIN
(SELECT team2, count(team2) AS c_t2 FROM ipl_m GROUP BY team2) AS t2
ON t1.team1 = t2.team2
JOIN
(SELECT winner, count(winner) AS c_w FROM ipl_m GROUP BY winner) AS w
ON t1.team1 = w.winner OR t2.team2 = w.winner
ORDER BY won_percentage DESC;
Resulting Table:
team | played | won | won_percentage |
---|---|---|---|
Chennai Super Kings | 178 | 106 | 59.55056179775281 |
Mumbai Indians | 203 | 120 | 59.11330049261084 |
Delhi Capitals | 33 | 19 | 57.57575757575758 |
Sunrisers Hyderabad | 124 | 66 | 53.2258064516129 |
Kolkata Knight Riders | 192 | 99 | 51.5625 |
Table Definition:
CREATE TABLE ipl_m (
id integer PRIMARY KEY,
match_id integer NOT NULL,
city VARCHAR(20) NOT NULL,
date DATE NOT NULL,
player_of_match VARCHAR(50),
venue VARCHAR(75) NOT NULL,
neutral_venue BOOLEAN NOT NULL,
team1 VARCHAR(50) NOT NULL,
team2 VARCHAR(50) NOT NULL,
toss_winner VARCHAR(50) NOT NULL,
toss_decision VARCHAR(5) NOT NULL,
winner VARCHAR(50),
result VARCHAR(10),
result_margin float,
eliminator CHAR(1) NOT NULL,
method VARCHAR(3),
umpire1 VARCHAR(50),
umpire2 VARCHAR(50)
);
1 Answer 1
Each row in ipl_m
table has one winner and one loser.
So first extract winners and set field result
(it will be used in counting) to 1
:
SELECT
winner AS team,
1 as result
FROM ipl_m
Next extract losers and set field result
to 0
:
SELECT
CASE
WHEN team1 = winner THEN team2
ELSE team1
AS team,
0 as result
FROM ipl_m
Combine two sets with UNION
. Now SELECT
from resulting set grouping by team
column.
SELECT t.team AS team
, COUNT(*) AS played
, SUM(t.result) AS won
FROM (
SELECT
winner AS team,
1 as result
FROM ipl_m
UNION
SELECT
CASE
WHEN team1 = winner THEN team2
ELSE team1
AS team,
0 as result
FROM ipl_m
) AS t
GROUP BY t.team
Your solution uses 4 SELECT
and 2 JOIN
operators. Mine uses 3 SELECT
and 1 UNION
. Using fewer operations is usually preferred.
-
\$\begingroup\$ Thanks for the answer. Initially your code didn't work as is, just needed to add an END to that CASE statement. Also UNION removes duplicates, so UNION ALL should be used instead. After making those changes, things work perfectly! \$\endgroup\$gautham– gautham2021年11月02日 11:13:28 +00:00Commented Nov 2, 2021 at 11:13
-
\$\begingroup\$ CASE ... END - my bad. UNION ALL - you are right ! \$\endgroup\$JulStrat– JulStrat2021年11月02日 11:43:14 +00:00Commented Nov 2, 2021 at 11:43