How to merge similar items in a list

后端未结

关注

 6  593

花落未央 2021-01-19 02:50

I haven\'t found anything relevant on Google, so I\'m hoping to find some help here :)

I\'ve got a Python list as follows:

[[\'hoose\', 200], [\"Ba


      
      
        
          6条回答        

        
                    
            
            
                         
                
              
              
                
                   野的像风
                                             
                
                
                (楼主)
            
              
              
                2021-01-19 02:57
              

            
            
                        
To bring home the point from my comment, I just grabbed an implementation of that distance from here, and calculated some distances:

d('House', 'hoose') = 2
d('House', 'trousers') = 4
d('trousers', 'hoose') = 5


Now, suppose your threshold is 4. You would have to merge House and hoose, as well as House and trousers, but not trousers and hoose. Are you sure something like this can never happen with your data?

In the end, I think is more of a clustering problem, so you probably have to look into clustering algorithms. SciPy offers an implementation of hierarchical clustering that works with custom distance functions (be aware that this can be very slow for larger data sets - it also consumes a lot of memory).

The main problem is to decide on a measure for cluster quality, because there is not one correct solution for your problem. This paper(pdf) gives you a starting point, to understand that problem.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它6个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复