How do I remove repeated spaces in a string?

前端 未结 7 1729
悲哀的现实
悲哀的现实 2020-12-29 20:05

I have a string:

\"foo (2 spaces) bar (3 spaces) baaar (6 spaces) fooo\"

How do I remove repetitious spaces in it so there shou

相关标签:
7条回答
  • 2020-12-29 20:31

    Which method performs better?

    $ ruby -v
    ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
    
    $ cat squeeze.rb 
    require 'benchmark'
    include Benchmark
    
    string = "foo  bar   bar      baaar"
    n = 1_000_000
    bm(6) do |x|
      x.report("gsub      ") { n.times { string.gsub(/\s+/, " ") } }
      x.report("squeeze   ") { n.times { string.squeeze } }
      x.report("split/join") { n.times { string.split.join(" ") } }
    end
    
    $ ruby squeeze.rb 
                user     system      total        real
    gsub        4.970000   0.020000   4.990000 (  5.624229)
    squeeze     0.600000   0.000000   0.600000 (  0.677733)
    split/join  2.950000   0.020000   2.970000 (  3.243022)
    
    0 讨论(0)
  • 2020-12-29 20:38

    Updated benchmark from @zetetic's answer:

    require 'benchmark'
    include Benchmark
    
    string = "foo  bar   bar      baaar"
    n = 1_000_000
    bm(12) do |x|
      x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
      x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
      x.report("split/join")   { n.times { string.split.join(" ") } }
    end
    

    Which results in these values when run on my desktop after running it twice:

    ruby test.rb; ruby test.rb
                      user     system      total        real
    gsub          6.060000   0.000000   6.060000 (  6.061435)
    squeeze(' ')  4.200000   0.010000   4.210000 (  4.201619)
    split/join    3.620000   0.000000   3.620000 (  3.614499)
                      user     system      total        real
    gsub          6.020000   0.000000   6.020000 (  6.023391)
    squeeze(' ')  4.150000   0.010000   4.160000 (  4.153204)
    split/join    3.590000   0.000000   3.590000 (  3.587590)
    

    The issue is that squeeze removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ') does meet the needs, but slows down its operation.

    string.squeeze
     => "fo bar bar bar"
    

    I was thinking about how the split.join could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:

    require 'benchmark'
    include Benchmark
    
    string = (["foo  bar   bar      baaar"] * 10_000).join
    puts "String length: #{ string.length } characters"
    n = 100
    bm(12) do |x|
      x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
      x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
      x.report("split/join")   { n.times { string.split.join(" ") } }
    end
    
    ruby test.rb ; ruby test.rb
    
    String length: 250000 characters
                      user     system      total        real
    gsub          2.570000   0.010000   2.580000 (  2.576149)
    squeeze(' ')  0.140000   0.000000   0.140000 (  0.150298)
    split/join    1.400000   0.010000   1.410000 (  1.396078)
    
    String length: 250000 characters
                      user     system      total        real
    gsub          2.570000   0.010000   2.580000 (  2.573802)
    squeeze(' ')  0.140000   0.000000   0.140000 (  0.150384)
    split/join    1.400000   0.010000   1.410000 (  1.397748)
    

    So, long lines do make a big difference.


    If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.

    Not really. Here's a version of the benchmark to test just that assertion:

    require 'benchmark'
    include Benchmark
    
    string = "foo  bar   bar      baaar"
    puts string.gsub(/\s+/, " ")
    puts string.gsub(/\s{2,}/, ' ')
    puts string.gsub(/\s\s+/, " ")
    
    string = (["foo  bar   bar      baaar"] * 10_000).join
    puts "String length: #{ string.length } characters"
    n = 100
    bm(18) do |x|
      x.report("gsub")               { n.times { string.gsub(/\s+/, " ") } }
      x.report('gsub/\s{2,}/, "")')  { n.times { string.gsub(/\s{2,}/, ' ') } }
      x.report("gsub2")              { n.times { string.gsub(/\s\s+/, " ") } }
    end
    # >> foo bar bar baaar
    # >> foo bar bar baaar
    # >> foo bar bar baaar
    # >> String length: 250000 characters
    # >>                          user     system      total        real
    # >> gsub                 1.380000   0.010000   1.390000 (  1.381276)
    # >> gsub/\s{2,}/, "")    1.590000   0.000000   1.590000 (  1.609292)
    # >> gsub2                1.050000   0.010000   1.060000 (  1.051005)
    

    If you want speed, use gsub2. squeeze(' ') will still run circles around a gsub implementation though.

    0 讨论(0)
  • 2020-12-29 20:39

    String#squeeze has an optional parameter to specify characters to squeeze.

    irb> "asd  asd asd   asd".squeeze(" ")
    => "asd asd asd asd"
    

    Warning: calling it without a parameter will 'squezze' ALL repeated characters, not only spaces:

    irb> 'aaa     bbbb     cccc 0000123'.squeeze
    => "a b c 0123"
    
    0 讨论(0)
  • 2020-12-29 20:40

    Just use gsub and regexp. For example:

    str = "foo  bar   bar      baaar"
    str.gsub(/\s+/, " ")
    

    will return new string or you can modify str directly using gsub!.

    BTW. Regexp are very useful - there are plenty resources in the internet, for testing your own regexpes try rubular.com for example.

    0 讨论(0)
  • 2020-12-29 20:45
    >> str = "foo  bar   bar      baaar"
    => "foo  bar   bar      baaar"
    >> str.split.join(" ")
    => "foo bar bar baaar"
    >>
    
    0 讨论(0)
  • 2020-12-29 20:45

    Use a regular expression to match repeating whitespace (\s+) and replace it by a space.

    "foo    bar  foobar".gsub(/\s+/, ' ')
    => "foo bar foobar"
    

    This matches every whitespace, as you only want to replace spaces, use / +/ instead of /\s+/.

    "foo    bar  \nfoobar".gsub(/ +/, ' ')
    => "foo bar \nfoobar"
    
    0 讨论(0)
提交回复
热议问题