Performance issue using view vs. direct statement

Question 1

I have a table with structure similar to this:

table values
(
id1 int,
id2 int,
value real,
value_at timestamp,
constraint primary key (id1,id2,value_at)
)

I need to get the "current value" for a given combination (id1,id2) --- that is, the record with highest value_at among the records with a given combination id1,id2.

The "normal" SQL query would be (example for id1,id2 = 100,200):

select ... from values 
where (id1,id2) = (100,200) 
 and value_at = (select max(value_at) from values 
 where (id1,id2) = (100,200))

However, I want to have a view that hides this complexity:

select value from vw_current_value where (id1,id2) = (100,200)

My attempt is:

create view vw_current_value as 
select id1, id2, value from values as outer_v 
where value_at = (select max(value_at) from values as inner_v 
 where outer_v.id1 = inner_v.id1 and outer_v.id2 = inner_v.id2)

It works, but getting a current value for a given (id1,id2) is 12x slower than what I get using the "direct" SQL statement (without using the view) as reported by EXPLAIN ANALYZE (0.04 ms for the non-view, 0.48 ms for the view, with a table that has 250k records).

My question is: is there a more efficient way to write the view? Perhaps using some other SQL construct that I'm either overlooking or unfamiliar with?

Notice that I would like to:

Avoid a Pl/PgSQL procedure for this.
Use a view and not the "direct" select statement.
Stick to version 9.2.4 (however, if this is known to have much better performance in later versions, I would certainly like to know about it).

EDIT: As requested, here's the output of the two explain analyze for each of the queries (first one for the direct query, second for the query using the view):

Index Scan using pk_values on values (cost=0.09..8.43 rows=1 width=24) (actual time=0.018..0.019 rows=1 loops=1)
 Index Cond: ((id1 = 100) AND (id2 = 200) AND (value_at = 1ドル))
 InitPlan 2 (returns 1ドル)
 -> Result (cost=0.08..0.09 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1)
 InitPlan 1 (returns 0ドル)
 -> Limit (cost=0.00..0.08 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1)
 -> Index Only Scan Backward using pk_values on values (cost=0.00..5.98 rows=74 width=8) (actual time=0.011..0.011 rows=1 loops=1)
 Index Cond: ((id1 = 100) AND (id2 = 200) AND (value_at IS NOT NULL))
 Heap Fetches: 0
Total runtime: 0.036 ms
Bitmap Heap Scan on values outer_v (cost=5.05..262.46 rows=1 width=24) (actual time=0.457..0.458 rows=1 loops=1)
 Recheck Cond: ((id1 = 100) AND (id2 = 200))
 Filter: (value_at = (SubPlan 2))
 Rows Removed by Filter: 75
 -> Bitmap Index Scan on pk_values (cost=0.00..5.05 rows=74 width=0) (actual time=0.017..0.017 rows=76 loops=1)
 Index Cond: ((id1 = 100) AND (id2 = 200))
 SubPlan 2
 -> Result (cost=0.08..0.09 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=76)
 InitPlan 1 (returns 2ドル)
 -> Limit (cost=0.00..0.08 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=76)
 -> Index Only Scan Backward using pk_values on values inner_v (cost=0.00..6.02 rows=76 width=8) (actual time=0.003..0.003 rows=1 loops=76)
 Index Cond: ((id1 = outer_v.id1) AND (id2 = outer_v.id2) AND (value_at IS NOT NULL))
 Heap Fetches: 0
Total runtime: 0.478 ms

Question 2

Please add the execution plans using explain (analyze, verbose) for both statements.

Question 3

Irrelevant to the question but you shouldn't use reserved words like values fro table or column names.

Question 4

Good catch --- and no, that's not the actual name that I used in the table; I just thought I'd simplify the question.

Question 5

You're going to get slower execution times because you are aggregating the entire view and then querying for a part of it. With a direct query, you are getting the pieces of the table and then aggregating the results.

Question 6

Yes, I had more or less concluded this --- my guess was (is) that the query optimizer could (should?) realize that the two queries are the same and optimize away the "cost of abstraction", so to speak, introduced by the query. That, or as originally asked, whether there's a different SQL construct that could do the same more efficiently.

Question 7

You might want to try using a window function for that:

create view vw_current_value as 
select id1, id2, value from (
 select id1, id2, value, 
 row_number() over (partition by id1, id2 order by value_at desc) rn
 from your_values_table
) t where t.rn = 1;

sqlfiddle

EDIT:

The above has an advantage of being standard SQL. However (thanks to @a_horse_with_no_name) Postgres also offers a proprietary alternative: DISTINCT ON(something), which reportedly performs better:

create view vw_current_value as 
select distinct on (id1, id2) id1, id2, value 
from your_values_table
order by id1, id2, value_at desc;

sqlfiddle 2

Comparison of the execution plans of the alternatives is left as an excercise for the reader.

Question 8

distinct on() is usually faster on Postgres than the equivalent solution using a window function.

Question 9

@a_horse_with_no_name - interesting. Is it a Postgres SQL extension?

Question 10

yes: postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

Question 11

Interesting. I just tried the distinct on version, and it is reasonably fast --- it is still twice as slow as the direct (non-view) query, but I guess I can live with that, performance-wise, as a price to pay for the increased quality in the design that I get using the view. Thanks!

mustaccio mustaccio 28.6k24 gold badges60 silver badges77 bronze badges · Answer 1 · 2016-01-12 21:20:52Z

You might want to try using a window function for that:

create view vw_current_value as 
select id1, id2, value from (
 select id1, id2, value, 
 row_number() over (partition by id1, id2 order by value_at desc) rn
 from your_values_table
) t where t.rn = 1;

sqlfiddle

EDIT:

The above has an advantage of being standard SQL. However (thanks to @a_horse_with_no_name) Postgres also offers a proprietary alternative: DISTINCT ON(something), which reportedly performs better:

create view vw_current_value as 
select distinct on (id1, id2) id1, id2, value 
from your_values_table
order by id1, id2, value_at desc;

sqlfiddle 2

Comparison of the execution plans of the alternatives is left as an excercise for the reader.

distinct on() is usually faster on Postgres than the equivalent solution using a window function.
@a_horse_with_no_name - interesting. Is it a Postgres SQL extension?
yes: postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
Interesting. I just tried the distinct on version, and it is reasonably fast --- it is still twice as slow as the direct (non-view) query, but I guess I can live with that, performance-wise, as a price to pay for the increased quality in the design that I get using the view. Thanks!

Stack Exchange Network

Performance issue using view vs. direct statement

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Performance issue using view vs. direct statement

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions