PostgreSQL: Grouping and Aggregating on multiple columns

Question 1

Problem Statement:

I am working on this simple dataset from Kaggle. I have provided a snippet of data with only required columns in below table. Dataset is quite simple, it has all IPL (cricket) matches listed with teams who played each match (team1 and team2) along with winner of that match.

Now I am trying to get total matches played by all teams along with matches won by each team, I have again provided a snippet of output below the code. Same can be performed by "finding all occurrences of a particular team in column team1" + "finding all occurrences of a particular team in column team2".

While the code does give proper result, I can sense this is not the best approach. I would like to know some better way to do it along with good practices and naming conventions to follow.

Dataset:

team1	team2	winner
Royal Challengers Bangalore	Kolkata Knight Riders	Kolkata Knight Riders
Kings XI Punjab	Chennai Super Kings	Chennai Super Kings
Delhi Daredevils	Rajasthan Royals	Delhi Daredevils
Mumbai Indians	Royal Challengers Bangalore	Royal Challengers Bangalore
Kolkata Knight Riders	Deccan Chargers	Kolkata Knight Riders
Rajasthan Royals	Kings XI Punjab	Rajasthan Royals

Code:

SELECT t1.team1 AS team, c_t1 + c_t2 AS played, c_w AS won, CAST(c_w AS FLOAT) / (c_t1 + c_t2) * 100 AS won_percentage
FROM 
 (SELECT team1, count(team1) AS c_t1 FROM ipl_m GROUP BY team1) AS t1 
JOIN 
 (SELECT team2, count(team2) AS c_t2 FROM ipl_m GROUP BY team2) AS t2 
ON t1.team1 = t2.team2
JOIN
 (SELECT winner, count(winner) AS c_w FROM ipl_m GROUP BY winner) AS w
ON t1.team1 = w.winner OR t2.team2 = w.winner
ORDER BY won_percentage DESC;

Resulting Table:

team	played	won	won_percentage
Chennai Super Kings	178	106	59.55056179775281
Mumbai Indians	203	120	59.11330049261084
Delhi Capitals	33	19	57.57575757575758
Sunrisers Hyderabad	124	66	53.2258064516129
Kolkata Knight Riders	192	99	51.5625

Table Definition:

CREATE TABLE ipl_m (
 id integer PRIMARY KEY,
 match_id integer NOT NULL,
 city VARCHAR(20) NOT NULL,
 date DATE NOT NULL,
 player_of_match VARCHAR(50),
 venue VARCHAR(75) NOT NULL,
 neutral_venue BOOLEAN NOT NULL,
 team1 VARCHAR(50) NOT NULL,
 team2 VARCHAR(50) NOT NULL,
 toss_winner VARCHAR(50) NOT NULL,
 toss_decision VARCHAR(5) NOT NULL,
 winner VARCHAR(50),
 result VARCHAR(10),
 result_margin float,
 eliminator CHAR(1) NOT NULL,
 method VARCHAR(3),
 umpire1 VARCHAR(50),
 umpire2 VARCHAR(50)
);

Question 2

Each row in ipl_m table has one winner and one loser. So first extract winners and set field result (it will be used in counting) to 1:

SELECT 
 winner AS team,
 1 as result
FROM ipl_m

Next extract losers and set field result to 0:

SELECT
 CASE
 WHEN team1 = winner THEN team2
 ELSE team1
 AS team,
 0 as result
FROM ipl_m

Combine two sets with UNION. Now SELECT from resulting set grouping by team column.

SELECT t.team AS team
, COUNT(*) AS played
, SUM(t.result) AS won
FROM (
SELECT 
 winner AS team,
 1 as result
FROM ipl_m
UNION
SELECT
 CASE
 WHEN team1 = winner THEN team2
 ELSE team1
 AS team,
 0 as result
FROM ipl_m
) AS t
GROUP BY t.team

Your solution uses 4 SELECT and 2 JOIN operators. Mine uses 3 SELECT and 1 UNION. Using fewer operations is usually preferred.

Question 3

Thanks for the answer. Initially your code didn't work as is, just needed to add an END to that CASE statement. Also UNION removes duplicates, so UNION ALL should be used instead. After making those changes, things work perfectly!

Question 4

CASE ... END - my bad. UNION ALL - you are right !

JulStrat JulStrat 3501 silver badge5 bronze badges · Accepted Answer · 2021-10-30 19:32:34Z

Each row in ipl_m table has one winner and one loser. So first extract winners and set field result (it will be used in counting) to 1:

SELECT 
 winner AS team,
 1 as result
FROM ipl_m

Next extract losers and set field result to 0:

SELECT
 CASE
 WHEN team1 = winner THEN team2
 ELSE team1
 AS team,
 0 as result
FROM ipl_m

Combine two sets with UNION. Now SELECT from resulting set grouping by team column.

SELECT t.team AS team
, COUNT(*) AS played
, SUM(t.result) AS won
FROM (
SELECT 
 winner AS team,
 1 as result
FROM ipl_m
UNION
SELECT
 CASE
 WHEN team1 = winner THEN team2
 ELSE team1
 AS team,
 0 as result
FROM ipl_m
) AS t
GROUP BY t.team

Your solution uses 4 SELECT and 2 JOIN operators. Mine uses 3 SELECT and 1 UNION. Using fewer operations is usually preferred.

Thanks for the answer. Initially your code didn't work as is, just needed to add an END to that CASE statement. Also UNION removes duplicates, so UNION ALL should be used instead. After making those changes, things work perfectly!

Stack Exchange Network

PostgreSQL: Grouping and Aggregating on multiple columns

Problem Statement:

Dataset:

Code:

Resulting Table:

Table Definition:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PostgreSQL: Grouping and Aggregating on multiple columns

Problem Statement:

Dataset:

Code:

Resulting Table:

Table Definition:

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions