Why does string.split with a regular expression that contains a capturing group return an array that ends with an empty string?

前端未结

关注

 4  1837

I\'d like to split an input string on the first colon that still has characters after it on the same line.

For this, I am using the regular expression /:(.+)/<


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  你的背包        
                
              
                            
                2020-12-10 11:15
              
            
            
                                                                       
If we change the regex to /:.+/ and perform a split on it you get:

["aaa", ""]


This makes sense as the regex is matching the :bbb:ccc.
And gives you the same output, if you were to manually split that string.

>>> 'aaa:bbb:ccc'.split(':bbb:ccc')
['aaa', '']


Adding the capture group in just saves the bbb:ccc, but shouldn't change the original split behaviour.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-10 11:17
              
            
            
                                                                       
From the ECMAScript 2015 spec (String.prototype.split):


  If separator is a regular expression that contains capturing
  parentheses, then each time separator is matched the results
  (including any undefined results) of the capturing parentheses are
  spliced into the output array. For example,

  "A<B>bold</B>and<CODE>coded</CODE>".split(/<(\/)?([^<>]+)>/)

  
  evaluates to the array:

  ["A", undefined, "B", "bold", "/", "B", "and", undefined,
  "CODE", "coded", "/", "CODE", ""]



Like in your example example, the output array here contains a trailing empty string, which is the portion of the input string past "coded" that isn't captured by the separator pattern (which captures "/" and "CODE").

Not obvious, but makes sense as otherwise the separator captures would end up in the end of the split array where they actually would not separate anything.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2020-12-10 11:28
              
            
            
                                                                       
Interesting. Learnt a lot from this question. Let me share what I learnt.

Dot doesn't match the new line

If we think about it, the intention is to split the string based on the : followed by one or more number of characters. If that is the case, the output should have been

['aaa', '\nbbb:ccc', '']


right? Because the .+ matches greedily. So, it should have split at :\nbbb:ccc, where : matches : and .+ matches \nbbb:ccc. But the actual output you got was

[ 'aaa:\nbbb', 'ccc', '' ]


This is because, . does not match line terminators. Quoting MDN,


  (The dot, the decimal point) matches any single character except line terminators: \n, \r, \u2028 or \u2029.


So, :\n doesn't match :(.+). That is why it doesn't break there. If you actually meant to match the new line as well, either use [^] or [\s\S].

For example,

console.log(data.split(/:([\s\S]+)/));
// [ 'aaa:\nbbb', 'ccc', '' ]
console.log(data.split(/:([\s\S]+)/));
// [ 'aaa', '\nbbb:ccc', '' ]
console.log(data.split(/:([^]+)/));
// [ 'aaa', '\nbbb:ccc', '' ]




Now to answer your actual question, why there is an empty string at the end of splitting. When you cut a big line, how many lines do you get? Two small lines. So whenever you make a cut, there should be two objects. In your case, aaa:\nbbb is the first cut, the actual place the cut happend is :ccc, and since the string ends there, an empty string is included to indicate that the that is the end of the string.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2020-12-10 11:28
              
            
            
                                                                       
My regexp always generates an extra element at the end of the array returned by string.prototype.split(). So I simply truncate the array every time. Seems better than Array.filter when it's always the last element that is removed. I'm parsing CSS/SVG transforms, splitting on both left and right parentheses. Either of these works: /\(|\)/ or /[\(\)]/.

For example:  

arr = "rotate(90  46  88) scale(1.2 1.2)".split(/\(|\)/);
arr.length--;


Or if you want to get fancy and cram it into one line:  

(arr = "rotate(90  46  88) scale(1.2 1.2)".split(/\(|\)/)).length--;


The result is: ["rotate","90 46 88","scale","1.2 1.2"]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复