I\'m having some fun trying to pick a decent SQL Server 2008 spatial index setup for a data set I am dealing with.
The dataset is polygons, representing contours ove
I too have found it very difficult to "GUESS" what an appropriate spatial index will be for a particular table of geometries. I tried making more educated guesses using the sp_help_spatial_geometry_index stored procedure. All this did was tell me how poorly my spatial index was performing after each "GUESS". Even if I limited my options by only considering 2-8 CELLS_PER_OBJECT, that alone gives 567 permutations (3 types chosen 4 times = 81. Then multiply by 7 CELLS_PER_OBJECT options). I decided I was going to let SQL server do the experimenting for me and give me some empirical evidence. I created a stored procedure that would spin through the permutations and rebuild the spatial index on a spatial table for each one. Then it would test query performance of each permutation of the spatial index using two supplied geometry instances. I selected one geometry instance that included the entire data set and then another instance that included a smaller portion of the data set. The proc uses STIntersect() 4 times on each instance and then records the results in a table. You can then query the results table to find out which spatial index performed best on your particular data set. Give it a try and let me know if you have any suggested improvements or observations.
Create the proc using this https://gist.github.com/anonymous/5322650. Then set up an execution statement using this example:
/* set up some strings to be used to create geometry instances when our test spatial queries run */
DECLARE @ada VARCHAR(MAX)
SET @ada = 'GEOMETRY::STGeomFromText(''POLYGON ((2422068 527322, 2422068 781170, 2565405 781170, 2565405 527322, 2422068 527322))'', 0)'
DECLARE @mer VARCHAR(MAX)
SET @mer = 'GEOMETRY::STGeomFromText(''POLYGON ((2451235 696087, 2451235 721632, 2473697 721632, 2473697 696087, 2451235 696087))'', 0)'
DECLARE @mer1 VARCHAR(MAX)
SET @mer1 = 'GEOMETRY::STGeomFromText(''POLYGON ((244386 712283, 2443866 717980, 2454872 717980, 2454872 712283, 244386 712283))'', 0)'
DECLARE @mer2 VARCHAR(MAX)
SET @mer2 = 'GEOMETRY::STGeomFromText(''POLYGON ((2434259 687278, 2434259 701994, 2449657 701994, 2449657 687278, 2434259 687278))'', 0)'
EXEC gis.sp_tune_spatial_index 'PARCEL_ADA', 'S104_idx', 2, 8, @ada, @mer1
GO
NOTE: Obviously, rebuilding a spatial index 567 times will take a long time. Kick it off command line or just let it run while you do other things. If it is a dataset you are going to use often and the geometries remain similar, it will be worth the time it takes to run the proc. Results table shows performance in milliseconds.
If the query is for displaying data then you could split up your large polygons using a grid. These would be then very quick to retrieve with an index. You could remove the outlines so the features would still look contiguous.
Most commercial GIS packages will have tools to split one polygon dataset by another. Search for tools that do intersections.
If you are using OpenSource then have a look at QGIS and http://www.ftools.ca which "perform geoprocessing operations including intersections, differencing, unions, dissolves, and clipping." I've not used the latter myself.
Have a look at: http://postgis.refractions.net/docs/ch04.html#id2790790 for why large features are bad.
There is more on the Filter clause here - http://blogs.msdn.com/b/isaac/archive/2010/03/04/filter-one-odd-duck.aspx
Something else to check is that the spatial index is actually being used in the query plan. You may have to force the query to use the index with the WITH clause:
http://blogs.msdn.com/b/isaac/archive/2008/08/29/is-my-spatial-index-being-used.aspx
More details on indexes below:
http://blogs.msdn.com/b/isaac/archive/2009/05/28/sql-server-spatial-indexing.aspx
Also try running sp_help_spatial_geometry_index for your data to see what settings to use for your spatial index
http://msdn.microsoft.com/en-us/library/cc627426.aspx
Running this SP with some test geometry produces all sorts of statistics to try and tailor your index to your data. A full list of properties is at http://msdn.microsoft.com/en-us/library/cc627425.aspx
These include values such as:
From the results of sp_help_spatial_geometry_index it looks like you may have issues with the geometry itself rather than the spatial index.
The Base_Table_Rows count looks to be a bug - http://connect.microsoft.com/SQLServer/feedback/details/475838/number-of-rows-in-base-table-incorrect-in-sp-help-spatial-geography-index-xml It may be worth recreating table / database and trying the index from scratch.
Total_Number_Of_ObjectCells_In_Level0_In_Index 60956 is a lot of features to return at level 0. It is likely they are either outside the spatial index extent or nulls. It then runs the Intersect (Number_Of_Times_Secondary_Filter_Is_Called 60956) on all these features which would explain why it is slow. Even though the docs claim no performance hit for null features - I believe it still has to look up the records, even if no intersect is performed.
NULL and empty instances are counted at level 0 but will not impact performance. Level 0 will have as many cells as NULL and empty instances at the base table.
The Primary_Filter_Efficiency of 0.003281055 I believe indicates 0.03% efficiency!
A few things to try:
The MakeValid statement:
UPDATE MyTable SET GeomFieldName = GeomFieldName.MakeValid()
Reset / double check SRID:
UPDATE MyTable SET GeomFieldName.STSrid = 4326
Add in some fields to show the extents of your features. This may highlight issues / NULL geometries.
ALTER TABLE MyTable ADD MinX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MinY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((1)).STY,0)) PERSISTED ALTER TABLE MyTable ADD MaxX AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STX,0)) PERSISTED ALTER TABLE MyTable ADD MaxY AS (CONVERT(int,GeomFieldName.STEnvelope().STPointN((3)).STY,0)) PERSISTED
In your index query you use:
CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit]
(
[geom]
)USING GEOMETRY_GRID
WITH (
BOUNDING_BOX =(-90, -180, 90, 180),
...
The BOUNDING_BOX therefore maps to:
xmin = -90
ymin = -180
xmax = 90
ymax = 180
So to create the BOUNDING_BOX for the world you should use:
CREATE SPATIAL INDEX [contasplit_sidx] ON [dbo].[ContASplit]
(
[geom]
)USING GEOMETRY_GRID
WITH (
BOUNDING_BOX =(-180, -90, 180, 90),
...
This should create an index that fits your data and means all your features are covered by the index.