How to optimize STDistance execution?

Question 1

I am creating temporary table during stored procedure execution with the following structure:

[ID] BIGINT
[Point] GEOGRAPHY

the ID is not unique - there are about 200 records for each ID.

I need to find a list with distinct IDs for which there is at least one Point to Point distance larger then constant value (for example 200 meters).

So, I am using something like this:

SELECT DISTINCT DS1.[ID]
FROM DataSource DS1
INNER JOIN DataSource DS2
 ON DS1.[ID] = DS2.[ID]
WHERE DS1.Point.STDistance(DS2.Point) > 200

For 23 000 points, the query is executed for 4-5 seconds. As I am expecting to have more values, I need to find better solution.

I guess that if there is faster way, I can always create a materialized table and implement additional logic that will calculated this on ID base.

I have created a spatial index, but the query optimizer is not using it. If I use a hint like this WITH (INDEX(SPATIAL_idx_test)) I am getting the following error:

Msg 8635, Level 16, State 4, Line 78
The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the comparator supplied in the predicate. Try removing the index hints or removing SET FORCEPLAN. `

Question 2

Regardless of any other improvements you make, be sure to test the impact on spatial execution times of enabling trace flags 6532, 6533, and 6534 (start-up only). These turn on native code spatial implementations. SQL Server 2012 Service Pack 3 or SQL Server 2014 Service Pack 2 required (Microsoft Support article). Native compilation is on by default from SQL Server 2016.

For STDistance the important trace flag is 6533. In a simple test, this improved execution time from 2100ms to 150ms without using a spatial index on my laptop's SQL Server 2012 instance.

Example adapted from SQL 2016 – It Just Runs Faster: Native Spatial Implementation(s) by Bob Ward):

Test Data

CREATE TABLE dbo.SpatialTest
(
 ID integer NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 Points geography NOT NULL
);
GO
-- Insert random sample points
SET NOCOUNT ON;
GO
DECLARE @Point float = 1.1;
INSERT dbo.SpatialTest (Points) 
VALUES ('POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
WHILE(SCOPE_IDENTITY() < 100000)
BEGIN
 SET @Point = @Point + RAND(SCOPE_IDENTITY());
 IF (@Point > 90.0)
 BEGIN
 SET @Point = -89.0 + RAND(SCOPE_IDENTITY());
 END;
 INSERT dbo.SpatialTest (Points) 
 VALUES ( 'POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
END;

Test Query

DBCC TRACEON (6533);
DBCC TRACESTATUS;
GO
DECLARE 
 @s datetime2 = SYSUTCDATETIME(),
 @g geography = 'POINT(1.0 80.5)';
SELECT [Matches] = COUNT_BIG(*) 
FROM dbo.SpatialTest AS ST
WHERE ST.Points.STDistance(@g) > 10000000.5
OPTION (MAXDOP 1, RECOMPILE);
SELECT [Elapsed STDistance Query (ms)] = DATEDIFF(MILLISECOND, @s, SYSUTCDATETIME());

Average execution times: 2100ms (trace flag off); 150ms (trace flag on).

Question 3

The query in the question wants to find pairs of points that are more than 200 meters apart. Such kind of query is not supported by the spatial indexes in SQL Server.

Microsoft docs Spatial Indexes Overview say:

Geography Methods Supported by Spatial Indexes

Under certain conditions, spatial indexes support the following set-oriented geography methods: STIntersects(), STEquals() and STDistance(). To be supported by a spatial index, these methods must be used within the WHERE clause of a query, and they must occur within a predicate of the following general form:
geography1.method_name(geography2) comparison_operator valid_number
To return a non-null result, geography1 and geography2 must have the same Spatial Reference Identifier (SRID). Otherwise, the method returns NULL.

Spatial indexes support the following predicate forms:
geography1.STIntersects(geography2) = 1
geography1.STEquals(geography2) = 1
geography1.STDistance(geography2) < number
geography1.STDistance(geography2) <= number

As you can see, the query in the question doesn't have one of the supported forms.

That's why you are getting this error message "Reason: Spatial indexes do not support the comparator supplied in the predicate" when you are trying to force use of the index.

You may try to rewrite the query to make a Cartesian product with all combinations of points and "subtract" from it a set of points that are within 200 meters, but I doubt that it would be more efficient, even if it used the index.

I have a table with ~3000 rows with geography locations in my database and I tried the simplified version of the query from the question on it. Adding AND DS1.Point.STDistance(DS2.Point) IS NOT NULL was not necessary, the index was used without this clause.

Index was used only when comparison was < 200.

It was not used when comparison was > 200.

SELECT
 DS1.[Building ID]
FROM 
 [dbo].[tblPhysicalBuildings] AS DS1
 CROSS JOIN [dbo].[tblPhysicalBuildings] AS DS2
WHERE 
 DS1.LocationPoint.STDistance(DS2.LocationPoint) < 200
OPTION(RECOMPILE)
;

less

SELECT
 DS1.[Building ID]
FROM 
 [dbo].[tblPhysicalBuildings] AS DS1
 CROSS JOIN [dbo].[tblPhysicalBuildings] AS DS2
WHERE 
 DS1.LocationPoint.STDistance(DS2.LocationPoint) > 200
OPTION(RECOMPILE)
;

more

Question 4

For 23 000 points, the query is executed for 4-5 seconds. As I am expecting to have more values, I need to find better solution. [...] I have created a spatial index, but the query optimizer is not using it. If I use a hint like this WITH (INDEX(SPATIAL_idx_test)) I am getting the following error

The only answer here makes no mentioning of an index. I just wanted to say if you migrate to PostGIS, the spatial extension for PostgreSQL which is practically the reference implementation for GIS and far more full featured, you'll have ST_DWithin. This will make use of a spatial index.

CREATE INDEX ON datasource USING gist( point );
SELECT DISTINCT ds1.id
FROM datasource AS ds1
WHERE EXISTS (
 SELECT 1
 FROM datasource ds2
 WHERE ds1.id = ds2.id
 AND NOT ST_DWithin( ds1.point, ds2.point, 200)
);

That will use the index to resolve ST_DWithin, and then invert that set. You may be able to do even better:

SELECT DISTINCT ds1.id
FROM datasource AS ds1
CROSS JOIN LATERAL (
 SELECT *
 FROM datasource AS ds2
 WHERE ds1.id = ds2=id
 ORDER BY ds1.point <=> ds.point DESC
 LIMIT 1;
) AS t2
WHERE ds1.id = t2.id
AND NOT ST_DWithin( ds1.point, t2.point, 200);

That will only run the distance compare once, and it'll use KNN to sort on index.

You can even reduce it to one index lookup with btree_gist.

CREATE EXTENSION btree_gist;
CREATE INDEX ON datasource USING gist( id, point );

PostgreSQL and PostGIS are free and open source software.

I answered a question like this back in Janurary, but I think the buffer with tolerance is probably better. If you need more accuracy though use .buffer()

Question 5

An alternative way that may be faster is to use the STIntersects and BufferWithTolerance methods to check if one point is within a certain distance of another.

SELECT DISTINCT DS1.[ID]
FROM DataSource DS1
INNER JOIN DataSource DS2
 ON DS1.[ID] = DS2.[ID]
WHERE DS2.STIntersects(DS1.BufferWithTolerance(200, 0.9, 0)) = 1

Do note though that BufferWithTolerance has a tolerance value (in my example 0.9) that is essentially a trade-off value between speed and accuracy. If you want exact results this probably isn't the method for you. I also seem to recall that STIntersects is an imprecise method but I can't find any reference to back that up at the moment so maybe I am mistaken about that.

Paul White ♦Paul White 95.4k30 gold badges440 silver badges689 bronze badges · Accepted Answer · 2018-03-02 10:18:15Z

Regardless of any other improvements you make, be sure to test the impact on spatial execution times of enabling trace flags 6532, 6533, and 6534 (start-up only). These turn on native code spatial implementations. SQL Server 2012 Service Pack 3 or SQL Server 2014 Service Pack 2 required (Microsoft Support article). Native compilation is on by default from SQL Server 2016.

For STDistance the important trace flag is 6533. In a simple test, this improved execution time from 2100ms to 150ms without using a spatial index on my laptop's SQL Server 2012 instance.

Example adapted from SQL 2016 – It Just Runs Faster: Native Spatial Implementation(s) by Bob Ward):

Test Data

CREATE TABLE dbo.SpatialTest
(
 ID integer NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 Points geography NOT NULL
);
GO
-- Insert random sample points
SET NOCOUNT ON;
GO
DECLARE @Point float = 1.1;
INSERT dbo.SpatialTest (Points) 
VALUES ('POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
WHILE(SCOPE_IDENTITY() < 100000)
BEGIN
 SET @Point = @Point + RAND(SCOPE_IDENTITY());
 IF (@Point > 90.0)
 BEGIN
 SET @Point = -89.0 + RAND(SCOPE_IDENTITY());
 END;
 INSERT dbo.SpatialTest (Points) 
 VALUES ( 'POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
END;

Test Query

DBCC TRACEON (6533);
DBCC TRACESTATUS;
GO
DECLARE 
 @s datetime2 = SYSUTCDATETIME(),
 @g geography = 'POINT(1.0 80.5)';
SELECT [Matches] = COUNT_BIG(*) 
FROM dbo.SpatialTest AS ST
WHERE ST.Points.STDistance(@g) > 10000000.5
OPTION (MAXDOP 1, RECOMPILE);
SELECT [Elapsed STDistance Query (ms)] = DATEDIFF(MILLISECOND, @s, SYSUTCDATETIME());

Average execution times: 2100ms (trace flag off); 150ms (trace flag on).

Stack Exchange Network

How to optimize STDistance execution?

4 Answers 4

Test Data

Test Query

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to optimize STDistance execution?

4 Answers 4

Test Data

Test Query

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions