Remove all text between two brackets

后端未结

关注

 5  1370

Suppose I have some text like this,

text<-c(\"[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It\'s harder to save


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2020-11-27 21:29
              
            
            
                                                                       
I think this technically answers what you've asked, but you probably want to add a \\: to the end of the regex for prettier text (removing the colon and space).

library(stringr)
str_replace_all(text, "\\[.+?\\]", "")

#> [1] ": We need tax policies that respect the wage earners..."


vs...

str_replace_all(text, "\\[.+?\\]\\: ", "")
#> [1] "We need tax policies that respect the wage earners..." 


Created on 2018-08-16 by the reprex package (v0.2.0).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-11-27 21:31
              
            
            
                                                                       
No need to use a PCRE regex with a negated character class / bracket expression, a "classic" TRE regex will work, too:

subject <- "Some [string] here and [there]"
gsub("\\[[^][]*]", "", subject)
## => [1] "Some  here and "


See the online R demo

Details:


\\[ - a literal [ (must be escaped or used inside  a bracket expression like [[] to be parsed as a literal [)
[^][]* - a negated bracket expression that matches 0+ chars other than [ and ] (note that the ] at the start of the bracket expression is treated as a literal ])
] - a literal ] (this character is not special in both PCRE and TRE regexps and does not have to be escaped).


If you want to only replace the square brackets with some other delimiters, use a capturing group with a backreference in the replacement pattern:

gsub("\\[([^][]*)\\]", "{\\1}", subject)
## => [1] "Some {string} here and {there}"


See another demo

The (...) parenthetical construct forms a capturing group, and its contents can be accessed with a backreference \1 (as the group is the first one in the pattern, its ID is set to 1).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-11-27 21:44
              
            
            
                                                                       
With this:

gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);


What the regex means:

  \[                       # '['
  [^\]]*                   # any character except: '\]' (0 or more
                           # times (matching the most amount possible))
  \]                       # ']'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-11-27 21:53
              
            
            
                                                                       
The following should do the trick. The ? forces a lazy match, which matches as few . as possible before the subsequent ].

gsub('\\[.*?\\]', '', text)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2020-11-27 21:54
              
            
            
                                                                       
Here'a another approach:

library(qdap)
bracketX(text, "square")

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复