Sort a list of strings based on regular expression match

后端未结

关注

 4  497

I have a text file that looks a bit like:

random text random text, can be anything blabla %A blabla
random text random text, can be anything blabla %D blabla


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  南旧        
                
              
                            
                2021-01-14 03:15
              
            
            
                                                                       
You could use a custom key function to compare the strings. Using the lambda syntax you can write that inline, like so:

strings.sort(key=lambda str: re.sub(".*%", "", str));


The re.sub(".*%", "", str) call effectively removes anything before the first percent sign so if the string has a percent sign it'll compare what comes after it, otherwise it'll compare the entire string.

Pedantically speaking, this doesn't just use the letter following the percent sign, it also uses everything after. If you want to use the letter and only the letter try this slightly longer line:

strings.sort(key=lambda str: re.sub(".*%(.).*", "\\1", str));

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2021-01-14 03:22
              
            
            
                                                                       
what about this? hope this helps.

def k(line):
    v = line.partition("%")[2]
    v = v[0] if v else 'z' # here z stands for the max value
    return v
print ''.join(sorted(open('data.txt', 'rb'), key = k))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2021-01-14 03:26
              
            
            
                                                                       
Here is a quick-and-dirty approach. Without knowing more about the requirements of your sort, I can't know if this satisfies your need. 

Assume that your list is held in 'listoflines':

listoflines.sort( key=lambda x: x[x.find('%'):] )


Note that this will sort all lines without a '%' character by their final character.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  余生分开走        
                
              
                            
                2021-01-14 03:35
              
            
            
                                                                       
In [1]: def grp(pat, txt): 
   ...:     r = re.search(pat, txt)
   ...:     return r.group(0) if r else '&'

In [2]: y
Out[2]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla',
 'random text random text, %C can be anything blabla blabla']

In [3]: y.sort(key=lambda l: grp("%\w", l))

In [4]: y
Out[4]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, %C can be anything blabla blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复