K-Nearest Neighbor Query in PostGIS

后端未结

关注

 5  510

I am using the following Nearest Neighbor Query in PostGIS :

SELECT g1.gid g2.gid FROM points as g1, polygons g2   
WHERE g1.gid <> g2.gid
ORDER BY g1.


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2020-12-15 08:27
              
            
            
                                                                       
You can do it with KNN index and lateral join.

SELECT v.gid, v2.gid,st_distance(v.the_geom, v2.the_geom)
  FROM geonames v, 
       lateral(select * 
                 from geonames v2
                where v2.id<>v.id
                ORDER BY v.the_geom <-> v2.the_geom LIMIT 10) v2
where v.gid in (...) - or other filtering condition

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2020-12-15 08:31
              
            
            
                                                                       
Assuming you have p point and g polygons, your original query:

SELECT g1.gid, g2.gid FROM points as g1, polygons g2   
WHERE g1.gid <> g2.gid
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT k;


Is returning the k nearest neighbours in the p x g set. The query may be using indexes, but it still has to order the entire p x g set to find the k rows with the smallest distance. What you instead want is the following:

SELECT g1.gid, 
      (SELECT g2.gid FROM polygons g2   
       --prevents you from finding every nearest neighbour twice
       WHERE g1.gid < g2.gid 
       --ORDER BY gid is erroneous if you want to limit by the distance
       ORDER BY ST_Distance(g1.the_geom,g2.the_geom)
       LIMIT k)
FROM points as g1;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2020-12-15 08:33
              
            
            
                                                                       
Just a few thoughts on your problem:

st_distance as well as st_area are not able to use indices. This is because both functions can not be reduced to questions like "Is a within b?" or "Do a and b overlap?". Even more concrete: GIST-indices can only operate on the bounding boxes of two objects. 

For more information on this you just could look in the postgis manual, which states an example with st_distance and how the query could be improved to perform better. 

However, this does not solve your k-nearest-neighbour-problem. For that, right now I do not have a good idea how to improve the performance of the query. The only chance I see would be assuming that the k nearest neighbors are always in a distance of below x meters. Then you could use a similar approach as done in the postgis manual. 

Your second query could be speeded up a bit. Currently, you compute the area for each object in table 1 as often as table has rows - the strategy is first to join the data and then select based on that function. You could reduce the count of area computations significantly be precomputing the area:

WITH polygonareas AS (
    SELECT gid, the_geom, st_area(the_geom) AS area
    FROM polygons
)
SELECT g1.gid, g2.gid
FROM polygonareas as g1 , polygonareas as g2 
WHERE g1.area > g2.area;


Your third query can be significantly optimized using bounding boxes: When the bounding boxes of two objects do not overlap, there is no way the objects do. This allows the usage of a given index and thus a huge performance gain. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2020-12-15 08:39
              
            
            
                                                                       
Since late September 2011, PostGIS has supported indexed nearest neighbor queries via a special operator(s) usable in the ORDER BY clause:

SELECT name, gid
FROM geonames
ORDER BY geom <-> st_setsrid(st_makepoint(-90,40),4326)
LIMIT 10;


...will return the 10 objects whose geom is nearest -90,40 in a scalable way. A few more details (options and caveats) are in that announcement post and use of the <-> and the <#> operators is also now documented in the official PostGIS 2.0 reference. (The main difference between the two is that <-> compares the shape centroids and <#> compares their boundaries — no difference for points, other shapes choose what is appropriate for your queries.)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-15 08:43
              
            
            
                                                                       
What you may need is the KNN index which is hopefully available soon in PostGIS 2.x and PostgreSQL 9.1: See http://blog.opengeo.org/tag/knn/
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复