I am creating temporary table during stored procedure execution with the following structure:
[ID] BIGINT
[Point] GEOGRAPHY
the ID
is not unique - there are about 200
records for each ID
.
I need to find a list with distinct IDs
for which there is at least one Point
to Point
distance larger then constant value (for example 200
meters).
So, I am using something like this:
SELECT DISTINCT DS1.[ID]
FROM DataSource DS1
INNER JOIN DataSource DS2
ON DS1.[ID] = DS2.[ID]
WHERE DS1.Point.STDistance(DS2.Point) > 200
For 23 000 points, the query is executed for 4-5
seconds. As I am expecting to have more values, I need to find better solution.
I guess that if there is faster way, I can always create a materialized table and implement additional logic that will calculated this on ID
base.
I have created a spatial index, but the query optimizer is not using it. If I use a hint
like this WITH (INDEX(SPATIAL_idx_test))
I am getting the following error:
Msg 8635, Level 16, State 4, Line 78
The query processor could not produce a query plan for a query with a spatial index hint. Reason: Spatial indexes do not support the comparator supplied in the predicate. Try removing the index hints or removingSET FORCEPLAN
. `
4 Answers 4
Regardless of any other improvements you make, be sure to test the impact on spatial execution times of enabling trace flags 6532, 6533, and 6534 (start-up only). These turn on native code spatial implementations. SQL Server 2012 Service Pack 3 or SQL Server 2014 Service Pack 2 required (Microsoft Support article). Native compilation is on by default from SQL Server 2016.
For STDistance
the important trace flag is 6533. In a simple test, this improved execution time from 2100ms to 150ms without using a spatial index on my laptop's SQL Server 2012 instance.
Example adapted from SQL 2016 – It Just Runs Faster: Native Spatial Implementation(s) by Bob Ward):
Test Data
CREATE TABLE dbo.SpatialTest
(
ID integer NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Points geography NOT NULL
);
GO
-- Insert random sample points
SET NOCOUNT ON;
GO
DECLARE @Point float = 1.1;
INSERT dbo.SpatialTest (Points)
VALUES ('POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
WHILE(SCOPE_IDENTITY() < 100000)
BEGIN
SET @Point = @Point + RAND(SCOPE_IDENTITY());
IF (@Point > 90.0)
BEGIN
SET @Point = -89.0 + RAND(SCOPE_IDENTITY());
END;
INSERT dbo.SpatialTest (Points)
VALUES ( 'POINT(' + CAST(@Point AS varchar(20)) + ' ' + CAST(@Point AS varchar(20)) + ')' );
END;
Test Query
DBCC TRACEON (6533);
DBCC TRACESTATUS;
GO
DECLARE
@s datetime2 = SYSUTCDATETIME(),
@g geography = 'POINT(1.0 80.5)';
SELECT [Matches] = COUNT_BIG(*)
FROM dbo.SpatialTest AS ST
WHERE ST.Points.STDistance(@g) > 10000000.5
OPTION (MAXDOP 1, RECOMPILE);
SELECT [Elapsed STDistance Query (ms)] = DATEDIFF(MILLISECOND, @s, SYSUTCDATETIME());
Average execution times: 2100ms (trace flag off); 150ms (trace flag on).
The query in the question wants to find pairs of points that are more than 200 meters apart. Such kind of query is not supported by the spatial indexes in SQL Server.
Microsoft docs Spatial Indexes Overview say:
Geography Methods Supported by Spatial Indexes
Under certain conditions, spatial indexes support the following set-oriented geography methods:
STIntersects()
,STEquals()
andSTDistance()
. To be supported by a spatial index, these methods must be used within theWHERE
clause of a query, and they must occur within a predicate of the following general form:geography1.method_name(geography2) comparison_operator valid_number
To return a non-null result,
geography1
andgeography2
must have the same Spatial Reference Identifier (SRID). Otherwise, the method returnsNULL
.Spatial indexes support the following predicate forms:
geography1.STIntersects(geography2) = 1 geography1.STEquals(geography2) = 1 geography1.STDistance(geography2) < number geography1.STDistance(geography2) <= number
As you can see, the query in the question doesn't have one of the supported forms.
That's why you are getting this error message "Reason: Spatial indexes do not support the comparator supplied in the predicate" when you are trying to force use of the index.
You may try to rewrite the query to make a Cartesian product with all combinations of points and "subtract" from it a set of points that are within 200 meters, but I doubt that it would be more efficient, even if it used the index.
I have a table with ~3000 rows with geography locations in my database and I tried the simplified version of the query from the question on it. Adding AND DS1.Point.STDistance(DS2.Point) IS NOT NULL
was not necessary, the index was used without this clause.
Index was used only when comparison was < 200
.
It was not used when comparison was > 200
.
SELECT
DS1.[Building ID]
FROM
[dbo].[tblPhysicalBuildings] AS DS1
CROSS JOIN [dbo].[tblPhysicalBuildings] AS DS2
WHERE
DS1.LocationPoint.STDistance(DS2.LocationPoint) < 200
OPTION(RECOMPILE)
;
SELECT
DS1.[Building ID]
FROM
[dbo].[tblPhysicalBuildings] AS DS1
CROSS JOIN [dbo].[tblPhysicalBuildings] AS DS2
WHERE
DS1.LocationPoint.STDistance(DS2.LocationPoint) > 200
OPTION(RECOMPILE)
;
For 23 000 points, the query is executed for 4-5 seconds. As I am expecting to have more values, I need to find better solution. [...] I have created a spatial index, but the query optimizer is not using it. If I use a hint like this
WITH (INDEX(SPATIAL_idx_test))
I am getting the following error
The only answer here makes no mentioning of an index. I just wanted to say if you migrate to PostGIS, the spatial extension for PostgreSQL which is practically the reference implementation for GIS and far more full featured, you'll have ST_DWithin
. This will make use of a spatial index.
CREATE INDEX ON datasource USING gist( point );
SELECT DISTINCT ds1.id
FROM datasource AS ds1
WHERE EXISTS (
SELECT 1
FROM datasource ds2
WHERE ds1.id = ds2.id
AND NOT ST_DWithin( ds1.point, ds2.point, 200)
);
That will use the index to resolve ST_DWithin
, and then invert that set. You may be able to do even better:
SELECT DISTINCT ds1.id
FROM datasource AS ds1
CROSS JOIN LATERAL (
SELECT *
FROM datasource AS ds2
WHERE ds1.id = ds2=id
ORDER BY ds1.point <=> ds.point DESC
LIMIT 1;
) AS t2
WHERE ds1.id = t2.id
AND NOT ST_DWithin( ds1.point, t2.point, 200);
That will only run the distance compare once, and it'll use KNN to sort on index.
You can even reduce it to one index lookup with btree_gist
.
CREATE EXTENSION btree_gist;
CREATE INDEX ON datasource USING gist( id, point );
PostgreSQL and PostGIS are free and open source software.
I answered a question like this back in Janurary, but I think the buffer with tolerance is probably better. If you need more accuracy though use .buffer()
An alternative way that may be faster is to use the STIntersects and BufferWithTolerance methods to check if one point is within a certain distance of another.
SELECT DISTINCT DS1.[ID]
FROM DataSource DS1
INNER JOIN DataSource DS2
ON DS1.[ID] = DS2.[ID]
WHERE DS2.STIntersects(DS1.BufferWithTolerance(200, 0.9, 0)) = 1
Do note though that BufferWithTolerance has a tolerance value (in my example 0.9) that is essentially a trade-off value between speed and accuracy. If you want exact results this probably isn't the method for you. I also seem to recall that STIntersects is an imprecise method but I can't find any reference to back that up at the moment so maybe I am mistaken about that.