I have a database with the following structure:
Date | role | type | duration |
---|---|---|---|
2022年04月16日 | Nurse | Food preparation | 45 |
2022年04月17日 | Nurse | Cleaning | 30 |
2022年04月17日 | Volunteer | Cleaning | 20 |
2022年04月17日 | Nurse | Food preparation | 60 |
Note: I don't know the values in the "type" column in advance, since they are defined by the user. Also, There can be multiple rows with overlapping date, role, and type.
I am using a charting library that would like for the data to be grouped as follows:
role | Food preparation | Cleaning |
---|---|---|
Nurse | 105 | 30 |
Volunteer | Null | 20 |
So far, I am able to group the data using the following query
select
role,
type,
sum(duration) as total_minutes
from work
group by role, type;
role | type | total_mintes |
---|---|---|
Nurse | Cleaning | 45 |
Nurse | Food preparation | 20 |
Volunteer | Cleaning | 15 |
Volunteer | Food preparation | 43 |
How can I "pivot"/"transpose" the data so that each row represents a role with one column containing the sum of minutes for each type of work?
In effect, I would like to transpose the data similar to the Pandas DataFrame.pivot_table function, but using only SQL.
2 Answers 2
First of all you will need to install the tablefunc extension using the create extension tablefunc;
command, otherwise the pivot function crosstab
will not work.
Even after reading this answer, it is recommended that you read the official documentation of PostgreSQL on crosstab here
As for how to do this:
select *
from crosstab(
'select
role,
type,
sum(duration) as total_minutes
from work
group by role, type
order by type',
'select distinct type from work order by type'
) as ct(
role text,
"Cleaning" text,
"Food preparation" text
);
Pay attention to the explicit order by
clause in both queries, this is a must, otherwise it may map values incorrectly as SQL does not guarantee the order of data without it.
You will have to specify each possible output of the column type
in the alias.
A more dynamic version of the above (although not perfect by any means):
create or replace function get_dynamic_transpose()
returns text
language plpgsql
as
$$
declare
v_output_columns text;
begin
select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
into v_output_columns
from testing;
return format(
'select *
from crosstab(
''select
role,
type,
sum(duration) as total_minutes
from testing
group by role, type
order by type'',
''select distinct type from testing order by type''
) as ct(
role text,
%s
);', v_output_columns
);
end;
$$;
This function would return the query you need to execute to get your desired result. It will dynamically build the list of possible columns you need for the output. This function can definitely be made to be more general purpose like it is done here but it's not a small amount of work to do that as PostgreSQL cannot return a set it does not know its definition beforehand.
There is the other option of this function instead of returning a query string, it can instead return an array of json objects each representing a row, and you would split this json into normal rows and column on application side. If such a solution is acceptable then this works fine:
create or replace function get_dynamic_transpose_jsonb()
returns jsonb
language plpgsql
as
$$
declare
v_output_columns text;
v_query text;
v_result jsonb;
begin
select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
into v_output_columns
from testing;
v_query = format(
'select jsonb_agg(ct)
from crosstab(
''select
role,
type,
sum(duration) as total_minutes
from testing
group by role, type
order by type'',
''select distinct type from testing order by type''
) as ct(
role text,
%s
);', v_output_columns
);
execute v_query into v_result;
return v_result;
end;
$$;
The result of this function would be something similar to the following
[{"role": "Nurse", "Cleaning": "30", "Food preparation": null}, {"role": "Volunteer", "Cleaning": null, "Food preparation": "55"}]
-
Ok, that looks like a good solution. However, I neglected to mention the values are created by the end-user, so I can't know them in advance.Brylie Christopher Oxley– Brylie Christopher Oxley2022年04月18日 08:29:18 +00:00Commented Apr 18, 2022 at 8:29
-
So tomorrow they might insert something else other than "Cleaning" and "Food preparation"? In that case yes, this requires a change in the solution. I will update it once I am able.Chessbrain– Chessbrain2022年04月18日 08:50:24 +00:00Commented Apr 18, 2022 at 8:50
-
Thanks for your help @Chessbrain. Yes, the underlying data are actually in separate tables: work_type, activity_type, worker_role, that can be defined when configuring the web application. I have simplified the question to remove the LEFT JOIN where I am getting the
name
value for each corresponding row, in order to focus on the aggregation. :-)Brylie Christopher Oxley– Brylie Christopher Oxley2022年04月18日 09:02:48 +00:00Commented Apr 18, 2022 at 9:02 -
For what it's worth, I'm trying to achieve something similar to Pandas DataFrame.pivot_table using the database directly. pandas.pydata.org/docs/reference/api/…Brylie Christopher Oxley– Brylie Christopher Oxley2022年04月18日 09:03:58 +00:00Commented Apr 18, 2022 at 9:03
-
1I've marked the answer as the accepted answer. In this case, it turns out much easier to use pandas, hopefully without too much performance penalty moving the data from Postgres to Python for aggregation :-)Brylie Christopher Oxley– Brylie Christopher Oxley2022年04月20日 13:28:01 +00:00Commented Apr 20, 2022 at 13:28
select role, sum(case when type = 'Cleaning' then duration else 0 end) as Cleaning, sum(case when type = 'Food preparation' then duration else 0 end) as Food preparation from work group by role;
I thinks this one work as your purpose.
-
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Marcello Miorelli– Marcello Miorelli2023年03月22日 15:31:16 +00:00Commented Mar 22, 2023 at 15:31
Explore related questions
See similar questions with these tags.