How can I use regexextract function in Google Docs spreadsheets to get “all” occurrences of a string?

前端未结

关注

 2  1686

遥遥无期 2021-02-07 23:18

My text string is in cell D2:

Decision, ERC Case No. 2009-094 MC, In the Matter of the Application for Authority to Secure Loan from the National Electrification


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   执笔经年
                                             
                
                
                (楼主)
            
              
              
                2021-02-08 00:09
              

            
            
                        
Here are two solutions, one using the specific terms in the author's example, the other one expanding on the author's sample regex pattern which appears to match all ALLCAPS terms. I'm not sure which is wanted, so I gave both.

(Put the block of text in A1)

Generic solution for all words in ALLCAPS

=regexreplace(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b","|"),"\W+","|"),"^\||\|$","")


Result:

ERC|MC|NEA|DIELCO


NB: The brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.

If you want space separation, the formula is a little simpler:

=trim(regexreplace(REGEXREPLACE(A1,"\b\w[^A-Z]*\b"," "),"\W+"," "))


Result:

ERC MC NEA DIELCO


(One way I like playing with regex in google spreadsheets is to read the regex pattern from another cell so I can change it without having to edit or re-paste into all the cells using that pattern. This looks so:

Cell A1:

Block of text


Cell B1 (no quote marks):

\b\w[^A-Z]*\b


Formula, in any cell:

=trim(regexreplace(REGEXREPLACE(A1,B$1," "),"\W+"," "))


By anchoring it to B$1, I can fill all my rows at once and the reference won't increment.)



Previous answer:

Specific solution for selected terms (ERC, DIELCO)

=regexreplace(join("|",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")),"(^\||\|$)","")


Result:

ERC|DIELCO


As before, the brunt of the work is in the CAPITALIZED formula, the lowercase functions are just for cleanup.

This formula will find any ERC or DIELCO, or both in the block of text. The initial order doesn't matter, but the output will always be ERC followed by DIELCO (the order of appearance is lost). This fixes the shortcoming with the previous answer using "(bra).*(bra)" in that isolated ERC or DIELCO can still be matched.

This also has a simpler form with space separation:

=trim(join(" ",IF(REGEXMATCH(A1,"ERC"),"ERC",""),IF(REGEXMATCH(A1,"DIELCO"),"DIELCO","")))


Result:

ERC DIELCO

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复