问题
I am implementing Haversine formula in PHP as the follows
$result=mysqli_query($mysqli,"SELECT *,( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) ) * cos( radians( `longitude` ) -radians({$lon}) ) +sin( radians({$lat}) ) * sin( radians( `latitude` ) ) ) ) AS distance FROM `places` HAVING distance <= {$radius} ORDER BY distance ASC") or die(mysqli_error($mysqli));
And Inside the Haversine fetch loop, I have the query that iterates through the results of haversine to select records that matches the IDs returned by the haversine formula. The Query is as follows.
while($row = mysqli_fetch_assoc($result))
{
$rest_time=$row['id'];
$result1=mysqli_query($mysqli,"SELECT * FROM my_friends WHERE personal_id='".$personal_id."' AND id='".$rest_time."'") or die(mysqli_error($mysqli));
//Some operations here
}
HOw can I perform Join operation to mingle these queries into a single one? WOuld it be wise to do so, from the optimisation point of view if the second table has, like 50k users and the first table has almost 1000 records?
回答1:
Any operation you do here which operates on all rows will be slow with that many records.
What you need to do is take advantage of index's. To use an index it must be a simple query and NOT the result of a function (as it is currently).
What you are doing by doing a radius search is making a circle around a point, By using some trig before the circle is made we can come up with the following
where S1 is the largest square inside, and S2 is the smallest square outside.
Now we can work out the dimensions of these two squares and anything OUTSIDE of S2 is hit by and index, and anything INSIDE of S1 is hit by an index, leaving only the small area inside which now needs to be looked up using the slow method.
If you need the distance from the point ignore the S1 sections (as everything inside of the circle needs the haversine function) as a note here, while everything inside of the circle needs it, not every point is within the distance, so both WHERE
clauses are still needed
So lets calculate these points using the unit circle
function getS1S2($latitude, $longitude, $kilometer)
{
$radiusOfEarthKM = 6371;
$latitudeRadians = deg2rad($latitude);
$longitudeRadians = deg2rad($longitude);
$distance = $kilometer / $radiusOfEarthKM;
$deltaLongitude = asin(sin($distance) / cos($latitudeRadians));
$bounds = new \stdClass();
// these are the outer bounds of the circle (S2)
$bounds->minLat = rad2deg($latitudeRadians - $distance);
$bounds->maxLat = rad2deg($latitudeRadians + $distance);
$bounds->minLong = rad2deg($longitudeRadians - $deltaLongitude);
$bounds->maxLong = rad2deg($longitudeRadians + $deltaLongitude);
// and these are the inner bounds (S1)
$bounds->innerMinLat = rad2deg($latitudeRadians + $distance * cos(5 * M_PI_4));
$bounds->innerMaxLat = rad2deg($latitudeRadians + $distance * sin(M_PI_4));
$bounds->innerMinLong = rad2deg($longitudeRadians + $deltaLongitude * sin(5 * M_PI_4));
$bounds->innerMaxLong = rad2deg($longitudeRadians + $deltaLongitude * cos(M_PI_4));
return $bounds;
}
Now your query becomes
SELECT
*
FROM
`places`
HAVING p.nlatitude BETWEEN {$bounds->minLat}
AND {$bounds->maxLat}
AND p.nlongitude BETWEEN {$bounds->minLong}
AND {$bounds->maxLong}
AND (
(
p.nlatitude BETWEEN {$bounds->innerMinLat}
AND {$bounds->innerMaxLat}
AND p.nlongitude BETWEEN {$bounds->innerMinLong}
AND {$bounds->innerMaxLong}
)
OR (
6371 * ACOS(
COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
RADIANS(`longitude`) - RADIANS({ $lon })
) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
)
)
)) <= {$radius}
ORDER BY distance ASC
IMPORTANT
The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized
This then can take advantage of the index, and allow the join to happen in a faster time
Adding the join this becomes
SELECT
*
FROM
`places` p
INNER JOIN my_friends f ON f.id = p.id
WHERE p.latitude BETWEEN {$bounds->minLat}
AND {$bounds->maxLat}
AND p.longitude BETWEEN {$bounds->minLong}
AND {$bounds->maxLong}
AND (
(
p.latitude BETWEEN {$bounds->innerMinLat}
AND {$bounds->innerMaxLat}
AND p.longitude BETWEEN {$bounds->innerMinLong}
AND {$bounds->innerMaxLong}
)
OR (
6371 * ACOS(
COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
RADIANS(`longitude`) - RADIANS({ $lon })
) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
)
)
) <= {$radius}
AND f.personal_id = {$personal_id}
ORDER BY distance ASC
IMPORTANT
The above has text for readability, Please ensure these values are escaped correctly / preferably parameterized
Assuming you have the correct indexes this query should remain fast and allow you to do the join.
Looking at the code above im not sure where personal_id
comes from so have left as it is
if you need the distance from the query, you can remove the S1 square
(
p.latitude BETWEEN {$bounds->innerMinLat}
AND {$bounds->innerMaxLat}
AND p.longitude BETWEEN {$bounds->innerMinLong}
AND {$bounds->innerMaxLong}
)
and move the second part of that OR
6371 * ACOS(
COS(RADIANS({ $lat })) * COS(RADIANS(`latitude`)) * COS(
RADIANS(`longitude`) - RADIANS({ $lon })
) + SIN(RADIANS({ $lat })) * SIN(RADIANS(`latitude`))
)
back to the select, which still makes use of S2.
I would also make sure to remove the "magic number" in the query 6371 is the radius of the earth in Kilometer
回答2:
In this case, put the first query as a derived subquery in the second:
SELECT p.*, f.* -- Select only the columns you need, not all
FROM
(
SELECT *,
( 6371 * acos( cos( radians({$lat}) ) * cos( radians( `latitude` ) )
* cos( radians( `longitude` ) -radians({$lon}) )
+sin( radians({$lat}) ) * sin( radians( `latitude` ) ) )
) AS distance
FROM `places`
HAVING distance <= {$radius}
ORDER BY distance ASC"
LIMIT 10 -- Didn't you forget this??
) AS p
JOIN my_friends AS f ON f.personal_id p.personal_id
AND id='".$rest_time."'" -- Huh??
来源:https://stackoverflow.com/questions/40199403/join-operation-in-haversine-formula