Join operation in Haversine formula

与世无争的帅哥 提交于 2019-12-04 13:33:23

问题


I am implementing Haversine formula in PHP as the follows

$result=mysqli_query($mysqli,"SELECT *,( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) ) * cos( radians( `longitude` ) -radians({$lon}) ) +sin( radians({$lat}) ) * sin( radians( `latitude` ) ) ) ) AS distance FROM `places` HAVING distance <= {$radius} ORDER BY distance ASC") or die(mysqli_error($mysqli));

And Inside the Haversine fetch loop, I have the query that iterates through the results of haversine to select records that matches the IDs returned by the haversine formula. The Query is as follows.

 while($row = mysqli_fetch_assoc($result)) 

    {
   $rest_time=$row['id'];

$result1=mysqli_query($mysqli,"SELECT * FROM my_friends  WHERE personal_id='".$personal_id."' AND id='".$rest_time."'") or die(mysqli_error($mysqli)); 

//Some operations here
    }

HOw can I perform Join operation to mingle these queries into a single one? WOuld it be wise to do so, from the optimisation point of view if the second table has, like 50k users and the first table has almost 1000 records?


回答1:


Any operation you do here which operates on all rows will be slow with that many records.

What you need to do is take advantage of index's. To use an index it must be a simple query and NOT the result of a function (as it is currently).

What you are doing by doing a radius search is making a circle around a point, By using some trig before the circle is made we can come up with the following

where S1 is the largest square inside, and S2 is the smallest square outside.

Now we can work out the dimensions of these two squares and anything OUTSIDE of S2 is hit by and index, and anything INSIDE of S1 is hit by an index, leaving only the small area inside which now needs to be looked up using the slow method.

If you need the distance from the point ignore the S1 sections (as everything inside of the circle needs the haversine function) as a note here, while everything inside of the circle needs it, not every point is within the distance, so both WHERE clauses are still needed

So lets calculate these points using the unit circle

function getS1S2($latitude, $longitude, $kilometer)
{
    $radiusOfEarthKM  = 6371;
    $latitudeRadians  = deg2rad($latitude);
    $longitudeRadians = deg2rad($longitude);
    $distance         = $kilometer / $radiusOfEarthKM;

    $deltaLongitude = asin(sin($distance) / cos($latitudeRadians));

    $bounds = new \stdClass();

    // these are the outer bounds of the circle (S2)
    $bounds->minLat  = rad2deg($latitudeRadians  - $distance);
    $bounds->maxLat  = rad2deg($latitudeRadians  + $distance);
    $bounds->minLong = rad2deg($longitudeRadians - $deltaLongitude);
    $bounds->maxLong = rad2deg($longitudeRadians + $deltaLongitude);

    // and these are the inner bounds (S1)
    $bounds->innerMinLat  = rad2deg($latitudeRadians  + $distance       * cos(5 * M_PI_4));
    $bounds->innerMaxLat  = rad2deg($latitudeRadians  + $distance       * sin(M_PI_4));
    $bounds->innerMinLong = rad2deg($longitudeRadians + $deltaLongitude * sin(5 * M_PI_4));
    $bounds->innerMaxLong = rad2deg($longitudeRadians + $deltaLongitude * cos(M_PI_4));

    return $bounds;
}

Now your query becomes

SELECT 
  *
FROM
  `places` 
HAVING p.nlatitude BETWEEN {$bounds->minLat} 
  AND {$bounds->maxLat} 
  AND p.nlongitude BETWEEN {$bounds->minLong} 
  AND {$bounds->maxLong} 
  AND (
    (
      p.nlatitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.nlongitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 
    OR (
      6371 * ACOS(
        COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
          RADIANS(`longitude`) - RADIANS({ $lon })
        ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
      )
    )
  )) <= {$radius} 
ORDER BY distance ASC 

IMPORTANT

The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized

This then can take advantage of the index, and allow the join to happen in a faster time

Adding the join this becomes

SELECT 
  *
FROM
  `places` p
  INNER JOIN my_friends f ON f.id = p.id
WHERE   p.latitude BETWEEN {$bounds->minLat} 
  AND {$bounds->maxLat} 
  AND p.longitude BETWEEN {$bounds->minLong} 
  AND {$bounds->maxLong} 
  AND (
    (
      p.latitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.longitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 
    OR (
      6371 * ACOS(
        COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
          RADIANS(`longitude`) - RADIANS({ $lon })
        ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
      )
    )
  )  <= {$radius} 
  AND f.personal_id = {$personal_id}
ORDER BY distance ASC 

IMPORTANT

The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized

Assuming you have the correct indexes this query should remain fast and allow you to do the join.

Looking at the code above im not sure where personal_id comes from so have left as it is

if you need the distance from the query, you can remove the S1 square

    (
      p.latitude BETWEEN {$bounds->innerMinLat} 
      AND {$bounds->innerMaxLat} 
      AND p.longitude BETWEEN {$bounds->innerMinLong} 
      AND {$bounds->innerMaxLong}
    ) 

and move the second part of that OR

  6371 * ACOS(
    COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
      RADIANS(`longitude`) - RADIANS({ $lon })
    ) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
  )

back to the select, which still makes use of S2.

I would also make sure to remove the "magic number" in the query 6371 is the radius of the earth in Kilometer




回答2:


In this case, put the first query as a derived subquery in the second:

SELECT  p.*, f.*    -- Select only the columns you need, not all
    FROM  
    (
        SELECT  *,
                ( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) )
                  * cos( radians( `longitude` ) -radians({$lon}) )
                  +sin( radians({$lat}) ) * sin( radians( `latitude` ) ) )
                ) AS distance
            FROM  `places`
            HAVING  distance <= {$radius}
            ORDER BY  distance ASC"
            LIMIT 10               -- Didn't you forget this??
    ) AS p
    JOIN  my_friends AS f  ON f.personal_id p.personal_id
      AND  id='".$rest_time."'"     -- Huh??


来源:https://stackoverflow.com/questions/40199403/join-operation-in-haversine-formula

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!