问题
What is the Difference between Geodist(sfield,x,y) and dist(2,x,y,a,b) in Apache Solr for Geo-Spacial Searches ??
dist(2,x,y,0,0) :- calculates the Euclidean distance between (0,0) and (x,y) for each document. Return the Distance between two Vectors (points) in an n-dimensional space.
I was earlier using geodist() distance function for Geo-Spatial searches on my website but its response time was large. so have done a POC(proof of concept) for different distance functions and found that dist(2,x,y,0,0) distance function is relatively taking half of the time. But I want to know the reason behind this and the algorithms which both functions are using to calculate the distance.
I have to make a difference matrix for the same to convey it further.
回答1:
The main difference is that geodist()
is intended to work with spatial field types.
Most spatial implementation are based on Lucene's Points API, which is a BKD Index. This field type is strictly limited to coordinates in lat/lon decimal degrees. Behind the scenes, latitude and longitude are indexed as separate numbers. Four main field types are available for spatial search :
- LatLonPointSpatialField
- LatLonType (now deprecated) and its non-geodetic twin PointType
- SpatialRecursivePrefixTreeFieldType (RPT for short), including RptWithGeometrySpatialField, a derivative
- BBoxField (for areas, 4 instances of another field type referred to by numberType)
In geodist (sfield, x, y)
, sfield is a spatial field type that represents two points (lat,lon), so the direct equivalent using dist() would be to implement dist (2, sfieldX, sfieldY, x, y)
with sfieldX and sfieldY being respectively the (lat,lon) coordinates of sfield.
Using dist (power, a, b, ...)
you can't query a spatial field type. In order to perform the same spatial search, you would have to specify every point's dimension separately. It would require 2 indexed fields (or values per field at least) for 2 dimensions, 3 for 3d, and so on. That makes a huge difference because you would have to index every coordinates of each point separately.
Besides, you can also use geodist()
as is with the BBoxField
field type that indexes a single rectangle per document field and supports searching via a bounding box. To do the same with dist()
you would have to compute the center point of the box to input each one of its coordinates as a function argument, so it would be too much hassle to yield the same result if you want to use an area as parameter.
Lastly, LatLonPointSpatialField
for example does distance calculations based on Haversine formula (Great Circle), BBoxField
does it a little faster because the rectangular shape is faster to compute. It's true that dist()
may be even faster but remember that requires more field to be indexed, a lot of preprocess at query time to be able to yield the same calculated distance, and, as mentioned by Mats, it wouldn't take the earth' curvature into account.
回答2:
An euclidean distance doesn't account for the curvature of the earth. If you're only sorting by the distance, the behavior can be OK - but only if your hits are within a small geographical area (the value of a unit compared to meters greatly change when you're getting closer to the poles).
There's an extensive and good answer that explains the difference between a Euclidean distance and a proper geographical distance (usually calculated using haversine) available at the GIS Stack Exchange.
Although at small scales any smooth surface looks like a plane, the accuracy of the Pythagorean formula depends on the coordinates used. When those coordinates are latitude and longitude on a sphere (or ellipsoid), we can expect that
- Distances along lines of longitude will be reasonably accurate.
- Distances along the Equator will be reasonably accurate.
- All other distances will be erroneous, in rough proportion to the differences in latitude and longitude.
来源:https://stackoverflow.com/questions/47690131/difference-between-geodist-and-dist-for-geo-spacial-search