Calculate distance to shore or coastline for a vessel

后端 未结 3 633
闹比i
闹比i 2021-01-05 08:42

For a dataset of 200M GPS (lon, lat) coordinates of vessels I want to calculate an approximate distance to the nearest land or coastline, as a function called distance_to_sh

相关标签:
3条回答
  • 2021-01-05 09:17

    The key here is that you need to use the "great circle" (orthodromic) distance calculations, which are designed to find the distance between two points on the surface of a sphere. Although the earth is not a perfect sphere, such calculations will get you very close (to within 0.5%), and non-spherical adjustments can be applied if this isn't close enough.

    There are many documentations of this formula on the internet. You will want to look for closed form solutions that involve X-Y-Z instead of polar coordinates, or convert your GPS coordinates into polar, one of the two.

    0 讨论(0)
  • 2021-01-05 09:25

    You need a Great Circle distance calculation formula. These are sometimes called Spherical Cosine Law, Haversine, or Vincenty, formulas.

    You can then compute the distance from each vessel to the nearest point in your coastline corpus. It's often helpful to use a bounding box computation to rule out irrelevant points before running the whole Great Circle formula on them.

    When you construct your coastline corpus, you may need to use interpolation to add extra coastline points if your raw coastline data has long segments in it. That's because you're computing distance to the nearest point not the nearest segment. Look up Great Circle interpolation.

    If your vessels are near either pole (well, near the North pole, seeing as how the South pole is on land) things are going to get funky with the standard Great Circle formulas and bounding rectangles. You probably should use the Vincenty formulation in that case.

    Here's a writeup on using a DBMS with indexing for this kind of purpose. https://www.plumislandmedia.net/mysql/haversine-mysql-nearest-loc/

    If you need NOAA-chart level accuracy, you probably need to learn about Universal Transverse Mercator projections. That's beyond the scope of a Stack Overflow answer.

    0 讨论(0)
  • The efficient way of solving this problem is to store all your coast points into a vantage point tree using the geodesic distance as your metric (it's important that the metric satisfy the triangle inequality). Then for each vessel you can query the VP tree to find the closed point.

    If there are M coast points and N vessels. Then the time to construct the VP tree requires M log M distance calculations. Each query requires log M distance calculations. A distance calculation for the ellipsoid takes about 2.5 μs. So the total time is (M + N) log M × 2.5 μs.

    Here is code using my library GeographicLib (version 1.47 or later) to carry out this calculation. This is just a stripped-down version of the example given for the NearestNeighbor class.

    // Example of using the GeographicLib::NearestNeighbor class.  Read lon/lat
    // points for coast from coast.txt and lon/lat for vessels from vessels.txt.
    // For each vessel, print to standard output: the index for the closest point
    // on coast and the distance to it.
    
    // This requires GeographicLib version 1.47 or later.
    
    // Compile/link with, e.g.,
    // g++ -I/usr/local/include -lGeographic -L/usr/local/bin -Wl,-rpath=/usr/local/lib -o coast coast.cpp
    
    // Run time for 30000 coast points and 46217 vessels is 3 secs.
    
    #include <iostream>
    #include <exception>
    #include <vector>
    #include <fstream>
    
    #include <GeographicLib/NearestNeighbor.hpp>
    #include <GeographicLib/Geodesic.hpp>
    
    using namespace std;
    using namespace GeographicLib;
    
    // A structure to hold a geographic coordinate.
    struct pos {
      double _lat, _lon;
      pos(double lat = 0, double lon = 0) : _lat(lat), _lon(lon) {}
    };
    
    // A class to compute the distance between 2 positions.
    class DistanceCalculator {
    private:
      Geodesic _geod;
    public:
      explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
      double operator() (const pos& a, const pos& b) const {
        double d;
        _geod.Inverse(a._lat, a._lon, b._lat, b._lon, d);
        if ( !(d >= 0) )
          // Catch illegal positions which result in d = NaN
          throw GeographicErr("distance doesn't satisfy d >= 0");
        return d;
      }
    };
    
    int main() {
      try {
        // Read in coast
        vector<pos> coast;
        double lat, lon;
        {
          ifstream is("coast.txt");
          if (!is.good())
            throw GeographicErr("coast.txt not readable");
          while (is >> lon >> lat)
            coast.push_back(pos(lat, lon));
          if (coast.size() == 0)
            throw GeographicErr("need at least one location");
        }
    
        // Define a distance function object
        DistanceCalculator distance(Geodesic::WGS84());
    
        // Create NearestNeighbor object
        NearestNeighbor<double, pos, DistanceCalculator>
          coastset(coast, distance);
    
        ifstream is("vessels.txt");
        double d;
        int count = 0;
        vector<int> k;
        while (is >> lon >> lat) {
          ++count;
          d = coastset.Search(coast, distance, pos(lat, lon), k);
          if (k.size() != 1)
              throw GeographicErr("unexpected number of results");
          cout << k[0] << " " << d << "\n";
        }
      }
      catch (const exception& e) {
        cerr << "Caught exception: " << e.what() << "\n";
        return 1;
      }
    }
    

    This example is in C++. To use python, you'll need to find a python implementation of VP trees and then you can use the python version of GeographicLib for the distance calculations.

    P.S. GeographicLib uses an accurate algorithm for the geodesic distance that satisfies the triangle inequality. The Vincenty method fails to converge for nearly antipodal points and so does not satisfy the triangle inequality.

    ADDENDUM: here's the python implementation: Install vptree and geographiclib

    pip install vptree geographiclib
    

    coast points (lon,lat) are in coast.txt; vessel positions (lon,lat) are in vessels.txt. Run

    import numpy
    import vptree
    from geographiclib.geodesic import Geodesic
    
    def geoddist(p1, p2):
      # p1 = [lon1, lat1] in degrees
      # p2 = [lon2, lat2] in degrees
      return Geodesic.WGS84.Inverse(p1[1], p1[0], p2[1], p2[0])['s12']
    
    coast = vptree.VPTree(numpy.loadtxt('coast.txt'), geoddist)
    print('vessel closest-coast dist')
    for v in numpy.loadtxt('vessels.txt'):
      c = coast.get_nearest_neighbor(v)
      print(list(v), list(c[1]), c[0])
    

    For 30000 coast points and 46217 vessels, this takes 18 min 3 secs. This is longer than I expected. The time to construct the tree is 1 min 16 secs. So the total time should be about 3 min.

    For 30000 coast points and 46217 vessels, this takes 4 min (using version 1.1.1 of vptree). For comparison, the time using the GeographicLib C++ library is 3 secs.

    LATER: I looked into why the python vptree is slow. The number of distance calculations to set up the tree is the same for GeographicLib's C++ implementation and python vptree package: 387248 which is about M log M, for M = 30000. (Here logs are base 2 and I set the bucket size to 1 for both implementations to ease comparisons.) The mean number of distance calculations for each vessel lookup for the C++ implementation is 14.7 which is close to the expected value, log M = 14.9. However the equivalent statistic for the python implementation is 108.9, a factor for 7.4 larger.

    Various factors influence the efficiency of the VP tree: the choice of vantage points, how the search is ordered, etc. A discussion of these considerations for the GeographicLib implementation is given here. I will ping the author of the python package about this.

    STILL LATER: I've submitted a pull request which cures the major problems with the efficiency of the python package vptree. The CPU time for my test is now about 4 min. The number of distance calculations per query is 16.7 (close to the figure for GeographicLib::NearestNeighbor, 14.7).

    0 讨论(0)
提交回复
热议问题