Using regular expressions to do mass replace in Notepad++ and Vim

前端未结

关注

 16  1543

So I\'ve got a big text file which looks like the following:


                      
              相关标签:


      
      
        
          16条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-08 04:56
              
            
            
                                                                       
A little after the fact, but in case its useful to anyone, I was able to follow one of the examples on here (by sdgfsdg) and quickly pick up Regular Expressions for Notepad++.

I had to similarly pull out some redundant data from a list of HTML select dropdown options, of the form:

<select>
  <option value="AC">saint_helena">Ascension Island</option>
  <option value="AD">andorra">Andorra</option>
  <option value="AE">united_arab_emirates">United Arab Emirates</option>
  <option value="AF">afghanistan">Afghanistan</option>:
  ...
</select>


And what I really wanted was:

<select>
  <option value="AC">Ascension Island</option>
  <option value="AD">Andorra</option>
  <option value="AE">United Arab Emirates</option>
  <option value="AF">Afghanistan</option>
  ...
</select>


After some hair-pulling I realized that as of version 5.8.5 (Sep. 2010) the Regular Expressions still don't seem to allow certain loops in the expressions (unless there is another syntax), for example, the following would find even ">united_arab_emirated_emirates"> despite its additional separating underscores:

(">)([a-z]+([_]*[a-z]*)*)(">)


This query worked in most generic RegEx tools but while within Notepad++, I had to account for the maximum number of nested underscores (which unfortunately was 8) by hand, using the much uglier:

(">)([a-z]+[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*)[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*[_]*[a-z]*(">)


If someone knows a way to simulate a Regex loop in Notepad++'s replace feature, please let me know.



Find what: *(">)([a-z]+[_][a-z][_][a-z][_][a-z][_][a-z])[_][a-z][_][a-z][_][a-z][_][a-z](">)*



Replace with: ">



Result: 255 occurrences were replaced.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-08 04:58
              
            
            
                                                                       
In vim

:%s/<option value='.\{1,}' >//


or

:%s/<option value='.\+' >//


In vim regular expressions you have to escape the one-or-more symbol, capturing parentheses,
the bounded number curly braces and some others.

See :help /magic to see which special characters need to be escaped (and how to change that).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉梦人生        
                
              
                            
                2020-12-08 04:58
              
            
            
                                                                       
Vim:

:%s/.* >//
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2020-12-08 04:59
              
            
            
                                                                       
In Notepad++ :

<option value value='1' >A
<option value value='2' >B
<option value value='3' >C
<option value value='4' >D


Find what: (.*)(>)(.)
Replace with: \3

Replace All


A
B
C
D

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-08 05:00
              
            
            
                                                                       
It may help if you're less specific. Your expression there is "greedy", which may be interpreted different ways by different programs. Try this in vim:

%s/^<[^>]+>//

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲哀的现实        
                
              
                            
                2020-12-08 05:03
              
            
            
                                                                       
Everything before the A, B, C, etc.

That seems so simple I must be misinterpreting you. It's just

:%s/<.*>//

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
3
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复