In short: I would like to use this input:
+---+---+---+
| x | y | z |
+---+---+---+
| 1 | 1 | a |
| 1 | 2 | b |
| 1 | 3 | c |
| 2 | 1 | d |
| 2 | 2 | e |
| 2 | 3 | f |
| 3 | 1 | g |
| 3 | 2 | h |
| 3 | 2 | i |
| . | . | . |
| n | . | .
+---+---+---+
to generate this output:
+---+---------+---------+---------+---------+
| y | z (x=1) | z (x=2) | z (x=3) | z (x=n) |
+---+---------+---------+---------+---------+
| 1 | a | d | g | . |
| 2 | b | e | h | . |
| 3 | c | f | i | . |
+---+---------+---------+---------+---------+
Table sample:
CREATE TABLE "public"."data" (
"x" text NOT NULL,
"y" text NOT NULL,
"z" text NOT NULL,
);
- The goal is to generate the output, in the most efficient way possible.
- max(x) will increase over time (->n)
- max(y) should remain constant but may increase by ~10%
- dynamic creation of z(x) columns & names
So far I have the following:
select * from crosstab('select y, x, z from data order by 1,2')
as ct (y varchar, x1z varchar, x2z varchar, x3z varchar,
x4z varchar, x5z varchar, x6z varchar)
;
which seems to work well (so far):
+----+-----+-----+-----+-----+-----+-----+
| y | x1z | x2z | x3z | x4z | x5z | x6z |
+----+-----+-----+-----+-----+-----+-----+
| 10 | fo | ob | ar | fo | ob | ar |
| 20 | ob | ar | fo | ob | ar | fo |
| 30 | ar | fo | ob | ar | fo | ob |
+----+-----+-----+-----+-----+-----+-----+
In the previous SQL snippet, I manually defined the static column names.
These should be based on x values & hence 'dynamic' matching below
select array (select distinct x from data order by x)
| x_campaigns |
| ------------------------- |
| ["1","2","3","4","5","6"] |
Another example to add clarity
- using the same crosstab SQL snippet, with arbitrarily defined column names
- these column names should be dynamically defined, in this example you can say: 'worldcup'+'year'
- in the previous case only 'x' is required, as is
CREATE TABLE world_cup(
year varchar(5),
game varchar(5),
score varchar(5))
;
-- insert values ...
select * from crosstab('select game, year, score from world_cup order by 1,2')
as ct (game varchar, WorldCup17 varchar, WorldCup18 varchar,
WorldCup19 varchar, WorldCup20 varchar, WorldCup21 varchar, WorldCup22 varchar)
+-------+------------+------------+------------+------------+------------+------------+
| match | worldcup17 | worldcup18 | worldcup19 | worldcup20 | worldcup21 | worldcup22 |
+-------+------------+------------+------------+------------+------------+------------+
| DE_FR | 2-2 | 1-1 | 0-0 | 3-2 | 0-2 | 1-2 |
| EN_DE | 2-0 | 0-2 | 2-1 | 0-0 | 3-0 | 0-0 |
| ES_FR | 0-1 | 0-0 | 1-5 | 0-5 | 1-1 | 3-1 |
+-------+------------+------------+------------+------------+------------+------------+
Thoughts?
Version: PostgreSQL 13.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
-
1"these should based on x values & hence dynamic" - not possible. One fundamental restriction of the SQL language is, that the number, names and data types of a query must be known to the database engine while the query is parsed/analyzed. The columns can't be "defined" while the data that is retrieved. SQL wasn't designed for this. Crosstab reports are much better done in the application displaying those results.user1822– user18222022年11月22日 07:03:49 +00:00Commented Nov 22, 2022 at 7:03
-
This is PIVOT (in PostgreSQL terms - crosstab). If your values list and, hence, output structure is dynamic then you'd use dynamic SQL, build and execute proper CROSSTAB query.Akina– Akina2022年11月22日 07:25:29 +00:00Commented Nov 22, 2022 at 7:25
-
@a_horse_with_no_name, only the column names, the structure remains constant e.g. next set is 100-103NorthernMonkey– NorthernMonkey2022年11月22日 07:45:03 +00:00Commented Nov 22, 2022 at 7:45
-
@Akina, an example?NorthernMonkey– NorthernMonkey2022年11月22日 07:50:24 +00:00Commented Nov 22, 2022 at 7:50
-
If the column names change then this by definition changes the structure of the query.user1822– user18222022年11月22日 07:50:32 +00:00Commented Nov 22, 2022 at 7:50
1 Answer 1
As mentioned, it's impossible to create a query that returns a different number of columns each time you run it. In general I recommend to do this kind of pivot/crosstab in the frontend (UI)
Possible alternatives are to aggregate into a JSON value:
select y,
jsonb_object_agg(concat('x',x, 'z'), y) as xz
from data
group by y
order by y;
This returns something like this:
y | xz
---+-------------------------------------------------------------------------------
10 | {"x1z": "10", "x2z": "10", "x3z": "10", "x4z": "10", "x5z": "10", "x6z": "10"}
20 | {"x1z": "20", "x2z": "20", "x3z": "20", "x4z": "20", "x5z": "20", "x6z": "20"}
30 | {"x1z": "30", "x2z": "30", "x3z": "30", "x4z": "30", "x5z": "30", "x6z": "30"}
Another alternative is to write a procedure that dynamically creates a view that does the pivot/crosstab. I am not a fan of the crosstab()
function and prefer filtered aggregation:
select y,
max(z) filter (where x = '1') as x1z,
max(z) filter (where x = '2') as x2z,
max(z) filter (where x = '3') as x3z,
max(z) filter (where x = '4') as x4z,
max(z) filter (where x = '5') as x5z,
max(z) filter (where x = '6') as x6z
from data
group by y
order by y;
This statement follows a pattern that can be automated to dynamically create a view based on the filtered aggregation.
create or replace procedure create_crosstab_view()
as
$$
declare
l_sql text;
begin
select 'create view crosstab_view as select y, '||
string_agg(format('max(z) filter (where x = %L) as %I', x, concat('x',x,'z')), ',' order by x)||' from data group by y'
into l_sql
from (
select distinct x from data
) t;
execute 'drop view if exists crosstab_view cascade;';
execute l_sql;
end;
$$
language plpgsql;
After creating the view, you run
select *
from crosstab_view;
If the source data changes, you re-create the view by running the procedure. You could put that into a trigger if you want to.
-
+1 I cannot up vote, I am too n00b & don't have enough points yet...NorthernMonkey– NorthernMonkey2022年11月23日 09:15:16 +00:00Commented Nov 23, 2022 at 9:15
-
ah but I can accept...game onNorthernMonkey– NorthernMonkey2022年11月23日 09:15:53 +00:00Commented Nov 23, 2022 at 9:15