I am building an app where the server needs to select rows based on some criteria/filters. One of them is the location of the user and the radius at which the user want's to see posts and other filters such date range and filter for a value of another column. This is going to be for an ad-hoc event discovery app.
I have read about PostGIS, its geometry
,geography
types and I know there is a native point
datatype. Based on this answer I understood that it is better to order from equality to range columns, even though I feel like geo point column should be the first.
Suppose the following few rows of a simplified events table (disregard the validity position data):
id event_title event_position event_type is_public start_date
(varchar) (point lat/lon) (smallint) (boolean) (timestamptz)
-- --------------------------- --------------- --------- --------- ----
1 "John's Party" (122,35) 0 0 2020年07月05日
2 "Revolution then Starbucks" (123,30) 1 1 2020年07月06日
3 "Study for math exam" (120,36) 2 1 2020年07月07日
4 "Party after exam" (120,36) 1 1 2020年07月08日
5 "Hiking next to the city" (95,40) 3 1 2020年07月09日
6 "Football match" (-42,31) 4 1 2020年07月10日
Imagine the table contains several thousand records at least, obviously not only 6.
So in this table a user would be able to query public events close to (122,34) by 100km (suppose first three rows fall into this area) and of event types 0, 1 or 2 falling between dates 2020年07月05日 and 2020年07月07日. The user would get the rows with ID 2 and 3.
This is the query I want to optimize with an appropriate index. My question is, how is it possible to create such an index? I thought about GiST or GIN index but not sure how these could help. Thanks!
1 Answer 1
@jjanes' answer on stackoverflow that I accepted
With the help of the btree_gist extension, you can include the event_type and start_date columns into the GiST index along with the event_position. However, the event_type won't be very useful in the index as long as the restriction clause is something like
event_type in (0, 1, 2)
. (But if the list only contains one element, it will be rewritten into an equality, and in that case it can use that column in the index efficiently.) So using the other two columns would be my starting point. I would put the usually more selective one first. If you are going to leave past events in the table rather than clearing them out, then the date is probably going to eventually be the more selective one.If the location condition is not very selective at all (e.g. most of your events are in NYC, and most of your users are in NYC, and almost everything is within 100km of almost everything else), then you might want a different approach. Just make a default BTREE index on
(event_type, start_date)
. Unlike GiST, such a BTREE can effectively use a condition likeevent_type in (0, 1, 2)
along withAND start_date between x and y
.I don't think that there is a fundamental reason GiST can't use the in-list effectively while BTREE can. Maybe that will be fixed for GiST in some future version.
Explore related questions
See similar questions with these tags.