Chop a string in Ruby into fixed length string ignoring (not considering/regardless) new line or space characters

前端未结

关注

 5  981

I have a string containing many new line and spaces. I need to split it into fixed length sub strings. E.g

a = \"This is some\\nText\\nThis is some text\"
<


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-01-02 21:32
              
            
            
                                                                       
Yet another way:

(0..(a.length / 17)).map{|i| a[i * 17,17] }
#=> ["This is some\nText", "\nThis is some tex", "t"]


Update

And benchmarking:

require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100

Benchmark.bm do |x|
  x.report("slice") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
end


result:

         user     system      total        real
slice  0.090000   0.000000   0.090000 (  0.091065)
regex  0.230000   0.000000   0.230000 (  0.233831)
eachc  1.420000   0.010000   1.430000 (  1.442033)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  眼角桃花        
                
              
                            
                2021-01-02 21:33
              
            
            
                                                                       
A solution with enumerable : split the array in single char with each_char, then use each_slice for doing the partition, and join the results:

"This is some\nText\nThis is some text"
  .each_char # => ["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", "\n", T", "e", "x", "t", "\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", " ", t", "e", "x", "t"]
  .each_slice(17) # => [["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", \n", "T", "e", "x", "t"], ["\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", e",  ", "t", "e", "x"], ["t"]]
  .map(&:join) # => ["This is some\nText", "\nThis is some tex", "t"]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2021-01-02 21:38
              
            
            
                                                                       
I noted an issue with @yevgeniy's answer above (I would comment directly but I lack the reputation).

If the string divides without remainder (a.length % divisor = 0), you end up with an extra array element of "".

a = "123456789"
(0..(a.length / 3)).map{|i| a[i * 3,3] }
# => ["123", "456", "789", ""]


I have resolved this issue and generalized the solution to a function (the function uses keyword arguments with a required keyword, requires Ruby 2.1+):

def string_prettifier(a_string: , split_char_count: 3)
  splits = (0...(a_string.length / split_char_count.to_f).ceil).map{|i| a_string[i * split_char_count, split_char_count] }
  return splits
end

s = "123456789"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789"]

s = "12345678"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "78"]

s = "1234567890"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789", "0"]


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2021-01-02 21:45
              
            
            
                                                                       
"This is some\nText\nThis is some text".scan(/.{1,17}/m)
# => ["This is some\nText", "\nThis is some tex", "t"]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2021-01-02 21:51
              
            
            
                                                                       
Yet another solution: unpack.

You need to construct a string for it like a17a17a17a17a8 (the last chunk needs to be shorter if the string is not exactly x times 17 chars long.

a = "This is some\nText\nThis is some text\nThis is some more text"
a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}"))
 => ["This is some\nText", "\nThis is some tex", "t\nThis is some mo", "re text"]


This appears to be by far the fastest one of the suggested, of course if the input string is huge, the unpack string will be huge as well. If that is the case, you will want a buffered reader for that thing, read it in chunks of x * 17 and do something like the above for each chunk.

require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100

Benchmark.bm do |x|
  x.report("slice ") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex ") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc ") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
  x.report("unpack") { n.times do ; a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) ; end }
end


Results:

user    system     total      real
slice   0.120000   0.000000   0.120000 (  0.130709)
regex   0.190000   0.000000   0.190000 (  0.186407)
eachc   1.430000   0.000000   1.430000 (  1.427662)
unpack  0.030000   0.000000   0.030000 (  0.032807)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复