postgres query performance: view vs function

Question 1

I have a scenario where I need to run a payroll report. The report calculates the payroll amount, grouped by staff member, for a specific date range.

For example, when running the report for 2016年11月01日 to 2016年11月30日, I would see the following result:

Staff Id Total
------------------
1 123.00
2 439.22

I'm using the following query for the above report:

select 
 user_id as staff_id,
 sum(amount) as total
from transaction
where
 business_id = <business_id> and
 type = 'staff' and
 kind = 'commission' and
 created_at between <start_date> and <end_date>
group by
 user_id;

I'm trying to determine the best way to optimize the performance of this query given the following requirements:

Results will vary based on the business_id, start_date and end_date
Data should always be fresh

It appears both views and functions would do the job, but I'm not 100% on which is the best approach given the requirements.

Sidenote: it would be great to cache the data based on the parameters mentioned above, but it seems like there isn't a great solution on the database side. Correct me if I'm wrong!

Additional information:

I'm running Postgres 9.6
I have indexes on the business_id, type, kind, user_id and created_at columns in the transaction table. These are all single column, btree indexes.

Question 2

To optimize your setup we would need to know the range of possible (common) queries, cardinalities and data distribution. Several different approaches are possible ...

Question 3

A view cannot help you producing an aggregate based on unknown parameters (business_id, start_date and end_date). It is nothing else than a given query, stored permanently in the database for later reuse. (Well, the implementation is more complicated, but that does not affect their usage.)

Considering your query, the furthest you can get with a view is

CREATE VIEW staff_commission AS
SELECT 
 user_id AS staff_id,
 business_id,
 amount,
 start_date,
 end_date
FROM transaction
WHERE
 type = 'staff' AND
 kind = 'commission';

Everything that is known beforehand is there, plus the columns you need for producing the desired output.

For obtaining the latter, you have to create a function in any case (optionally, working from the view). Building on the view makes sense when you have several queries that filter the date the same way.

The performance of all these will be the very same. If the indexes makes sense or not depends very much on the actual data. (I am pretty sure you don't need all of them, though.) Without knowing anything about those, it is very hard to guess which improvements are needed. Try what you have, check the EXPLAIN ANALYZE output and see if there is something missing.

Finally about caching: PostgreSQL is very smart about this. Going into details would fill a chapter or two in a book, but I would not worry about this until I see too many disk reads (which can be seen from EXPLAIN (ANALYZE, BUFFERS)).

Question 4

Your SQL query (either used "as is", through a view or using a function) would most probably benefit if you define a partial index (WHERE type='staff' and kind = 'comission'), using business_id as the first element of the index (you choose a single value) and created_at as the second (you choose a range of values).

That is, your index would be defined with:

CREATE INDEX 
 ON transaction (business_id, created_at) 
 WHERE type = 'staff' and kind = 'comission';

This index will filter out anything except the part of the table where you have the type and kind you're interested in. Once you know your two parameters (business_id and created_at), those two can be searched (fast) by using the index.

This won't help you if the vast majority of your data already already is of type = 'staff' and kind = 'comission', then probably this won't be useful. As already mentioned by @dezso, the usefulness will depend very much on your actual data.

You can get more information on Partial indexes and Use of indexes when querying ranges.

Question 5

You need the user_id in the index as well, for this query.

Question 6

@ypercubeTM user_id is used for grouping, but not in the where clause. Sometimes, PostgreSQL might use that fact to avoid some sorting steps... but I should have to make a practical test in this case, because I'm not really sure.

score 7 · Answer 1 · 2016-12-14 21:11:28Z

A view cannot help you producing an aggregate based on unknown parameters (business_id, start_date and end_date). It is nothing else than a given query, stored permanently in the database for later reuse. (Well, the implementation is more complicated, but that does not affect their usage.)

Considering your query, the furthest you can get with a view is

CREATE VIEW staff_commission AS
SELECT 
 user_id AS staff_id,
 business_id,
 amount,
 start_date,
 end_date
FROM transaction
WHERE
 type = 'staff' AND
 kind = 'commission';

Everything that is known beforehand is there, plus the columns you need for producing the desired output.

For obtaining the latter, you have to create a function in any case (optionally, working from the view). Building on the view makes sense when you have several queries that filter the date the same way.

The performance of all these will be the very same. If the indexes makes sense or not depends very much on the actual data. (I am pretty sure you don't need all of them, though.) Without knowing anything about those, it is very hard to guess which improvements are needed. Try what you have, check the EXPLAIN ANALYZE output and see if there is something missing.

Finally about caching: PostgreSQL is very smart about this. Going into details would fill a chapter or two in a book, but I would not worry about this until I see too many disk reads (which can be seen from EXPLAIN (ANALYZE, BUFFERS)).

joanolo joanolojoanolo 13.7k8 gold badges39 silver badges67 bronze badges · Answer 2 · 2016-12-15 08:07:46Z

Your SQL query (either used "as is", through a view or using a function) would most probably benefit if you define a partial index (WHERE type='staff' and kind = 'comission'), using business_id as the first element of the index (you choose a single value) and created_at as the second (you choose a range of values).

That is, your index would be defined with:

CREATE INDEX 
 ON transaction (business_id, created_at) 
 WHERE type = 'staff' and kind = 'comission';

This index will filter out anything except the part of the table where you have the type and kind you're interested in. Once you know your two parameters (business_id and created_at), those two can be searched (fast) by using the index.

This won't help you if the vast majority of your data already already is of type = 'staff' and kind = 'comission', then probably this won't be useful. As already mentioned by @dezso, the usefulness will depend very much on your actual data.

You can get more information on Partial indexes and Use of indexes when querying ranges.

You need the user_id in the index as well, for this query.
@ypercubeTM user_id is used for grouping, but not in the where clause. Sometimes, PostgreSQL might use that fact to avoid some sorting steps... but I should have to make a practical test in this case, because I'm not really sure.

Stack Exchange Network

postgres query performance: view vs function

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

postgres query performance: view vs function

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions