Need help optimizing a lat/Lon geo search for mysql

后端 未结 5 1558
走了就别回头了
走了就别回头了 2021-01-03 07:09

I have a mysql (5.0.22) myisam table with roughly 300k records in it and I want to do a lat/lon distance search within a five mile radius.

I have an index that cov

相关标签:
5条回答
  • 2021-01-03 07:55

    I think you really should consider the use of PostgreSQL (combined with Postgis).

    I have given up on MySQL for geospatial data (for now) because of the following reasons:

    • MySQL only supports spatial datatypes / spatial indexes on MyISAM tables with the inherent disadvantages of MyISAM (concerning transactions, referential integrity...)
    • MySQL implements some of the OpenGIS specifications only on a MBR-basis (minimum bounding rectangle) which is pretty useless for most serious geospatial querying-processing (see this link in the MySQL manual). Chances are you will need some of this functionality sooner of later.

    PostgreSQL/Postgis with proper (GIST) spatial indexes and proper queries can be extremely fast.

    Example: determining overlapping polygons between a 'small' selection of polygons and a table with over 5 million (!) very complex polygons, calculate the amount of overlap between these results + sort. Average runtime: between 30 and 100 milliseconds (This particular machine has a lot of RAM off course. Don't forget to tune your PostgreSQL install... (read the docs)).

    0 讨论(0)
  • 2021-01-03 07:55

    You really should avoid doing that much math in your select statement. That's probably the source of a lot of your slowdowns. Remember, SQL is a query language; it's really not optimized for trigonometric functions.

    SQL will be faster and your overall results will be faster if you do a very naive distance search (which will return more results) and then winnow your results.

    If you want to be using distance in your query, at the very least, use a squared distance calculation; sqrt calculations are notoriously slow. Squared distance is much easier to use. A squared distance calculation is simply using the square of the distance instead of the distance; it is much simpler. For cartesian coordinate systems, since the sum of the squares of the short sides of a right triangle equals the square of the hypotenuse, it's easier to calculate the square distance (just sum the two squares) than it is to calculate the distance; all you have to do is make sure that you're squaring the distance you want to compare to (so instead of finding the precise distance and comparing that to your desired distance (let's say 5), you find the square distance, and compare that to the square of the desired distance (25, if your desired distance was 5).

    0 讨论(0)
  • 2021-01-03 08:00

    Depending on the number of your listings could you create a view that contains

    Listing1Id, Listing2ID, Distance

    Basically just have all of the distances "pre-calculated"

    Then you could do something like:

    Select listing2ID from v_Distance d where distance < 5 and listing1ID = XXX

    0 讨论(0)
  • 2021-01-03 08:01

    When I implemented geo radius search I just loaded all of the us Zipcodes into memory with their lat long and then used my starting point with radius to get a list of zipcodes in the radius and then used that for my db query. Of course I was using solr to do my searching because the search space was in the 20 million row range but the same principles should apply. Apologies for the shallowness of this response as I'm on my phone.

    0 讨论(0)
  • 2021-01-03 08:04

    You are probably using a 'covering index' in your lat/lon only query. A covering index occurs when the index used by the query contains the data that you are selecting for. MySQL only needs to visit the index and never the data rows. See this for more info. That would explain why the lat/lon query is so fast.

    I suspect that the calculations and the sheer number of rows returned, slows down the longer query. (plus any temp table that has to be created for the having clause).

    0 讨论(0)
提交回复
热议问题