7

I have a table like this:

create table foo (foo_label text, foo_price int, foo_date date);
insert into foo (
 values
 ('aaa', 100, '2017-01-01'),
 ('aaa', NULL, '2017-02-01'),
 ('aaa', NULL, '2017-03-01'),
 ('aaa', NULL, '2017-04-01'),
 ('aaa', 140, '2017-05-01'),
 ('aaa', NULL, '2017-06-01'),
 ('aaa', 180, '2017-07-01')
 );

As you can see a few values on the foo_price column are missing.

What I need is that missing values get filled up with the "previous" available value in this way:

 foo_label | fixed_foo_price | foo_date
-----------+-----------------+------------
 aaa | 100 | 2017年01月01日
 aaa | 100 | 2017年02月01日
 aaa | 100 | 2017年03月01日
 aaa | 100 | 2017年04月01日
 aaa | 140 | 2017年05月01日
 aaa | 140 | 2017年06月01日
 aaa | 180 | 2017年07月01日

My attempt:

select 
 foo_label, 
 (case when foo_price is null then previous_foo_price else foo_price end) as fixed_foo_price,
 foo_date
from (
 select 
 foo_label, 
 lag(foo_price) OVER (PARTITION BY foo_label order by foo_date::date) as previous_foo_price, 
 foo_price,
 foo_date
 from foo
) T;

As you can see from here:

https://www.db-fiddle.com/#&togetherjs=s6giIonUxT

It doesn't fill completely the '100' series.

Any idea how can I get the wanted result?

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked Sep 18, 2017 at 14:24
0

2 Answers 2

14

Form groups with the window function count() and then take the first value in each group:

SELECT foo_label
 , first_value(foo_price) OVER (PARTITION BY foo_label, grp ORDER BY foo_date) AS fixed_foo_price
 , foo_date
FROM (
 SELECT foo_label
 , count(foo_price) OVER (PARTITION BY foo_label ORDER BY foo_date) AS grp
 , foo_price
 , foo_date
 FROM foo
 ) sub;

This works because count() only counts non-null values. So all rows with NULL end up in the same group as the last row with an actual value. Exactly what you need.

Leading null values (effectively group "0") end up with NULL. Add a default with COALESCE if you want. For instance to fill in 0 instead of NULL:

 , COALESCE(first_value(foo_price) OVER (...), 0) AS fixed_foo_price
answered Sep 18, 2017 at 17:26
0
2

Backfill missing data in postgresql walkthrough demo

Create a table called my_money with an index, numeric with nulls, and a date then insert some rows.

drop table if exists my_money; 
CREATE TABLE my_money ( tick character varying(10), 
 cci_val numeric(5,2), 
 date_val date); 
insert into my_money values('BTC', 35.3, '2021-10-10'); 
insert into my_money values('BTC', null, '2021-10-9'); 
insert into my_money values('BTC', 9.9, '2021-10-8'); 
insert into my_money values('BTC', null, '2021-10-7'); 
insert into my_money values('BTC', null, '2021-10-6'); 
insert into my_money values('BTC', 3.0, '2021-10-5'); 
insert into my_money values('BTC', null, '2021-10-4'); 
select * from my_money;
┌──────┬─────────┬────────────┐ 
│ tick │ cci_val │ date_val │ 
├──────┼─────────┼────────────┤ 
│ BTC │ 35.30 │ 2021年10月10日 │ 
│ BTC │ ¤ │ 2021年10月09日 │ <-- want 9.90 to fill here.
│ BTC │ 9.90 │ 2021年10月08日 │ 
│ BTC │ ¤ │ 2021年10月07日 │ <-- want 3.00 to fill here.
│ BTC │ ¤ │ 2021年10月06日 │ <-- want 3.00 to fill here.
│ BTC │ 3.00 │ 2021年10月05日 │ 
│ BTC │ ¤ │ 2021年10月04日 │ <-- want a default begin val to fill here.
└──────┴─────────┴────────────┘ 

Make a temporary table to construct our backfilled column named: backfilled_cci_val

drop table if exists my_money2; 
CREATE TABLE my_money2 as ( 
 SELECT tick, cci_val, 
 first_value(cci_val) OVER (
 PARTITION BY tick, grp ORDER BY date_val) AS backfilled_cci_val, 
 date_val
 FROM ( 
 SELECT tick, 
 count(cci_val) OVER (PARTITION BY tick ORDER BY date_val) AS grp,
 cci_val, date_val
 FROM my_money where tick = 'BTC'
 ) sub order by date_val desc
);
select * from my_money2;
┌──────┬─────────┬────────────────────┬────────────┐ 
│ tick │ cci_val │ backfilled_cci_val │ date_val │ 
├──────┼─────────┼────────────────────┼────────────┤ 
│ BTC │ 35.30 │ 35.30 │ 2021年10月10日 │ 
│ BTC │ ¤ │ 9.90 │ 2021年10月09日 │ 
│ BTC │ 9.90 │ 9.90 │ 2021年10月08日 │ 
│ BTC │ ¤ │ 3.00 │ 2021年10月07日 │ 
│ BTC │ ¤ │ 3.00 │ 2021年10月06日 │ 
│ BTC │ 3.00 │ 3.00 │ 2021年10月05日 │ 
│ BTC │ ¤ │ ¤ │ 2021年10月04日 │ <-- set a default value
└──────┴─────────┴────────────────────┴────────────┘ 

So far so good but we can't backfill the first null, because there it has no prior, so you'll have to decide a default begin value manually with an update:

update my_money2 set backfilled_cci_val = -1 
where tick = 'BTC' 
and backfilled_cci_val is null; 
select * from my_money2; 
┌──────┬─────────┬────────────────────┬────────────┐ 
│ tick │ cci_val │ backfilled_cci_val │ date_val │ 
├──────┼─────────┼────────────────────┼────────────┤ 
│ BTC │ 35.30 │ 35.30 │ 2021年10月10日 │ 
│ BTC │ ¤ │ 9.90 │ 2021年10月09日 │ 
│ BTC │ 9.90 │ 9.90 │ 2021年10月08日 │ 
│ BTC │ ¤ │ 3.00 │ 2021年10月07日 │ 
│ BTC │ ¤ │ 3.00 │ 2021年10月06日 │ 
│ BTC │ 3.00 │ 3.00 │ 2021年10月05日 │ 
│ BTC │ ¤ │ -1 │ 2021年10月04日 │ 
└──────┴─────────┴────────────────────┴────────────┘

Final step is to substitute the backfilled_cci_val from the new table back into cci_val from the original table:

update my_money set cci_val = backfilled_cci_val 
from my_money2 where my_money.date_val = my_money2.date_val; 
select * from my_money;
┌──────┬─────────┬────────────┐ 
│ tick │ cci_val │ date_val │ 
├──────┼─────────┼────────────┤ 
│ BTC │ 35.30 │ 2021年10月10日 │ 
│ BTC │ 9.90 │ 2021年10月09日 │ 
│ BTC │ 9.90 │ 2021年10月08日 │ 
│ BTC │ 3.00 │ 2021年10月07日 │ 
│ BTC │ 3.00 │ 2021年10月06日 │ 
│ BTC │ 3.00 │ 2021年10月05日 │ 
│ BTC │ -1.00 │ 2021年10月04日 │ 
└──────┴─────────┴────────────┘ 

Alternatively, if you like to live dangerously and backfill one-shot and in-place:

This update statement does the same as all the above, except does it inplace on the existing table.

update my_money f1 set cci_val = backfilled_cci_val from (
 SELECT tick, 
 cci_val, 
 first_value(cci_val) OVER 
 (PARTITION BY tick, grp ORDER BY date_val) AS backfilled_cci_val,
 date_val
 FROM (
 SELECT tick,
 count(cci_val) OVER (PARTITION BY tick ORDER BY date_val) AS grp,
 cci_val, date_val
 FROM my_money where tick = 'BTC'
 ) sub order by date_val desc
) f2 where f1.date_val = f2.date_val and f1.tick = 'BTC' and f1.tick = f2.tick;
update my_money set cci_val = -1 where tick = 'BTC' and cci_val is null;

Which produces the same final result explained above.

answered Nov 20, 2021 at 17:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.