18

I have a raster database in postgresql/postgis with these columns:

(ID, rast, data_of_data).

'rast' is the column that has raster files in WKT format. An example query to find the DN value of a point in WGS84 system (30.424, -1.66) and for 2002年01月09日 is the following:

SELECT 
 st_value(rast,(st_GeomFromText('POINT(30.424 -1.66)', 4326))) as val
FROM 
 my_table
WHERE
 date_of_data='2002-01-09'

Is there a method (eg. spatial index) to speed up those kind of queries?

nickves
11.7k3 gold badges44 silver badges79 bronze badges
asked Dec 4, 2012 at 3:09
2
  • Perhaps you could help us out by providing some more details: How many records are in my_table? How big is the data in the raster column? How many distinct dates do you have in date_of_data? Commented Dec 11, 2012 at 11:43
  • Add to this: what is the SRID of the rast column? Commented Dec 11, 2012 at 11:45

3 Answers 3

13
+50

This is an exciting question! How big is the raster you want to query? WKTRaster is stored in the database as a BLOB. In order to find the value at a specific point, from a known (x_0, y_0) corner coordinate row/column indices (i, j) are computed using (dx, dy) steps and rotation. With (i, j) known, the ST_Value() function can access the actual data at the correct byte offset.

This means that the DB has to read on average at least half of the data blob when answering a query for a point (depending on the implementation it may actually read all of the data at all times). I would therefore guess that WKTRaster performance suffers when the data BLOBs get too large. Tiling the dataset should speed up queries. Have a look at how SRTM data (coming in 6000x6000 pixel chunks) is handled in this tutorial. They actually tile the data into really small 50x50 pixels, which is a clear hint that my guessing may be not too far from the truth.

Spatially indexing raster data will probably just index the bounding box, which is no real help for your problem.

answered Dec 6, 2012 at 9:22
1
  • 3
    The tiling thing seems to be the way to go - see this link. You'll also need to add an index like this: CREATE INDEX srtm_tiled_rast_gist_idx ON srtm_tiled USING GIST (ST_ConvexHull(rast)); (source) Commented Dec 11, 2012 at 11:55
6

Two aspects that I found sped up my PostGIS raster calculations, were using integer values in the raster, and using multi-band rasters where possible. In this case, can the DN value be stored as integers, if this is not already being done?

The other thought (and I'm not certain it is relevant here) is to use multi-band rasters. For example, if you are looking at monthly slices of data, each month could be a raster layer. Then you can retrieve multiple values of a point at different time slices by querying the layered raster. I found this approach to be much quicker than querying separate rasters.

Finally, when you load your data there is the -t flag for TILE_SIZE. You could explore if the tile size that you are using works well for your query.

answered Dec 11, 2012 at 2:31
2
  • Multiband rasters will likely help if you need to query the same pixel's value for several months at the same time (to stick with your example), e.g. to analyse time series. The query in the question only retrieves one specific date. If the date was contained in one band, the DBMS would need to read all the other bands as well, even though they are of no interest for answering the query. This would probably deteriorate performance. Commented Dec 11, 2012 at 8:04
  • I agree - perhaps I did not emphasize that it is only useful if several values are needed at the same time; I'll clarify this. Commented Dec 11, 2012 at 15:03
3

Depending on the distribution of your data, you might get some very good speedups just by indexing the date_of_data column.

You can use the EXPLAIN ANALYZE syntax to figure out if your indexes are being used or not.

answered Dec 10, 2012 at 21:34
5
  • what kind of index? could you be more specific? Commented Dec 10, 2012 at 22:10
  • Just a standard btree index: create index tbl_name_date_idx on tbl_name (date_of_data). If you have many distinct dates this will drastically cut down the amount of data PostGIS has to process. Commented Dec 10, 2012 at 23:12
  • Thank you, but it didn't work for my query. Commented Dec 10, 2012 at 23:26
  • How did it not work? No noticeable performance gain, or other problems? If you have a table column which regularly appears in a WHERE clause, you should always consider indexing it. It will not only help in this case if you have many distinct dates (i.e. a large value domain) but also if you have a large number of records in the table. Commented Dec 11, 2012 at 7:57
  • Is the query using the index? Can you pastebin the output of explain analyze SELECT st_value(rast,(st_GeomFromText('POINT(30.424 -1.66)', 4326))) as val from my_table where date_of_data='2002年01月09日'? Commented Dec 11, 2012 at 11:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.