Having the following data in a table:
ID Category Value
1234 Cat01 V001
1234 Cat02 V002
1234 Cat03 V003
1234 Cat03 V004
1234 Cat03 V005
I want to have the following output:
ID Cat01 Cat02 Cat03
1234 V001 V002 V003
1234 V001 V002 V004
1234 V001 V002 V005
The output I want to achieve is a kind of pivot table where I have all the values vertically in a table and I want to have those values, horizontally, having the category as a column. But there are some categories that have multiples values, in that case, I need to repeat the values of all other categories and create a row per each repeated value
How can it be done in PostgreSQL?
3 Answers 3
This is a tricky one. crosstab()
expects one (or no) value per category for each row_name.
We can work around this restriction like this:
SELECT id
, COALESCE(cat01, max(cat01) OVER w)
, COALESCE(cat02, max(cat02) OVER w)
, COALESCE(cat03, max(cat03) OVER w)
FROM crosstab(
'SELECT id::text || row_number() OVER (PARTITION BY id, category ORDER BY value) * -1 AS ext_id
, id, category, value
FROM tbl
ORDER BY ext_id, category, value'
,$$VALUES ('Cat01'::text), ('Cat02'), ('Cat03')$$
) AS ct (xid text, id int, cat01 text, cat02 text, cat03 text)
WINDOW w AS (PARTITION BY id);
Returns your desired result.
How?
Add an extended id:
ext_id
from the existingid
and a row number for each value of the category for the sameid
. This way we ensure as many rows perid
in as there are values for the most common category. We get a derived table like this to build ourcrosstab()
on:ext_id | id | category | value ---------+------+----------+------- '1234-1' | 1234 | 'Cat01' | 'V001' '1234-1' | 1234 | 'Cat02' | 'V002' '1234-1' | 1234 | 'Cat03' | 'V003' '1234-2' | 1234 | 'Cat03' | 'V004' '1234-3' | 1234 | 'Cat03' | 'V005'
Now we can feed it to
crosstab()
using the safe 2-parameter form for missing attributes. Read the basics first if you are not familiar with this:
-
The original
id
is carried over as "extra column". See:
Your question leaves room for interpretation. My solution pairs the lowest values per category first and keeps filling the following rows until there are no values left. (We could combine multiple values per category any other way, it has not been defined.) If a category is short of values for a given
id
, the rest is filled in with NULL values.In the final step I replace those NULL values with the maximum value of each
category
perid
:COALESCE(cat01, max(cat01) OVER (PARTITION BY id, category))
which is effectively the same as:
max(cat01) OVER (PARTITION BY id, category)
I am hoping to make it slightly faster if we only default to the window function if the value is NULL.
Take a look here for an example of how to use the CROSSTAB
function. Also, take a good look at Erwin Brandstetter's post in the same thread and links within (especially the "Basics for crosstab():" link.
Be careful with NULL
s (see the discussion in link).
If you're not using a PostgreSQL version compiled from source, then all you have to do to access the CROSSTAB
function is to input
CREATE EXTENSION tablefunc;
On the command line (see EB link).
I'm not sure that I've totally grasped your supplementary information, but maybe the CTE approach I used to cross join status and slots (your equivalent of category and value) might help. If not, please expand on your comment. EB's code might also be of help.
What you look at is a crosstab. Assuming your table is called "Fact_Table", write:
select * from crosstab('select id, category, value from Fact_Table')
Also see http://www.postgresql.org/docs/9.5/static/tablefunc.html if you look for other variants.
-
'tablefunc' is an extensionSahap Asci– Sahap Asci2016年04月12日 18:22:10 +00:00Commented Apr 12, 2016 at 18:22