Removing spaces between single letters

后端 未结 8 2017
一向
一向 2021-01-07 06:01

I have a string that may contain an arbitrary number of single-letters separated by spaces. I am looking for a regex (in Perl) that will remove spaces between all (unknown n

相关标签:
8条回答
  • 2021-01-07 06:13

    This will do the job.

    (?<=\b\w)\s(?=\w\b)
    
    0 讨论(0)
  • 2021-01-07 06:14

    This piece of code

    #!/usr/bin/perl
    
    use strict;
    
    my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
    
    foreach my $string (@strings) {
       print "$string --> ";
       $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; # the only line that actually matters
       print "$string\n";
    }
    

    prints this:

    a b c --> abc
    ab c d --> ab cd
    a bcd e f gh --> a bcd ef gh
    abc d --> abc d
    

    I think/hope this is what you're looking for.

    0 讨论(0)
  • 2021-01-07 06:14

    Hi I have written simple javascript to do this it's simple and you can convert into any language.

    function compressSingleSpace(source){
    
        let words = source.split(" ");
    		let finalWords = [];
    		let tempWord = "";
    
    		for(let i=0;i<words.length;i++){
    
    			if(tempWord!='' && words[i].length>1){
    				finalWords.push(tempWord);
    				tempWord = '';
    			}
    
    			if(words[i].length>1){
    				finalWords.push(words[i]);
    			}else{
    				tempWord += words[i];
    			}
    
    		}
    
    		if(tempWord!=''){
    			finalWords.push(tempWord);
    		}
    
    		source = finalWords.join(" ");
        
        return source;
    
    }
    
    
    function convertInput(){
      let str = document.getElementById("inputWords").value;
      document.getElementById("firstInput").innerHTML = str;
      
      let compressed = compressSingleSpace(str);
        document.getElementById("finalOutput").innerHTML = compressed;
    }
    label{
        font-size:20px;
        margin:10px;
    }
    input{
        margin:10px;
        font-size:15px;
        padding:10px;
    }
    
    input[type="button"]{
      cursor:pointer;
      background: #ccc;
    }
    
    #firstInput{
      color:red;
      font-size:20px;
      margin:10px;
    }
    
    #finalOutput{
      color:green;
      font-size:20px;
      margin:10px;
    }
    <label for="inputWords">Enter your input and press Convert</label><br>
    <input id="inputWords" value="check this site p e t z l o v e r . c o m thanks">
    <input type="button" onclick="convertInput(this.value)" value="Convert" >
    <div id="firstInput">check this site p e t z l o v e r . c o m thanks</div>
    <div id="finalOutput">check this site petzlover.com thanks</div>

    0 讨论(0)
  • 2021-01-07 06:17

    It's not a regex but since I am lazy by nature I would it do this way.

    #!/usr/bin/env perl
    use warnings;
    use 5.012;
    
    my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
    for my $string ( @strings ) {
        my @s; my $t = '';
        for my $el ( split /\s+/, $string ) {
            if ( length $el > 1 ) {
            push @s, $t if $t;
            $t = '';
            push @s, $el;
            } else { $t .= $el; }
        }
        push @s, $t if $t;
        say "@s";
    }
    

    OK, my way is the slowest:

    no_regex   130619/s         --       -60%       -61%       -63%
    Alan_Moore 323328/s       148%         --        -4%        -8%
    Eric_Storm 336748/s       158%         4%         --        -5%
    canavanin  352654/s       170%         9%         5%         --
    

    I didn't include Ether's code because ( as he has tested ) it returns different results.

    0 讨论(0)
  • 2021-01-07 06:21

    Your description doesn't really match your examples. It looks to me like you want to remove any space that is (1) preceded by a letter which is not itself preceded by a letter, and (2) followed by a letter which is not itself followed by a letter. Those conditions can be expressed precisely as nested lookarounds:

    /(?<=(?<!\pL)\pL) (?=\pL(?!\pL))/
    

    tested:

    use strict;
    use warnings;
    
    use Test::Simple tests => 4;
    
    sub clean {
      (my $x = shift) =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g;
      $x;
    }
    
    ok(clean('ab c d')        eq 'ab cd');
    ok(clean('a bcd e f gh')  eq 'a bcd ef gh');
    ok(clean('a b c')         eq 'abc');
    ok(clean('ab c d')        eq 'ab cd');
    

    output:

    1..4
    ok 1
    ok 2
    ok 3
    ok 4
    

    I'm assuming you really meant one space character (U+0020); if you want to match any whitespace, you might want to replace the space with \s+.

    0 讨论(0)
  • 2021-01-07 06:26

    Now I have the slowest and the fastest.

    #!/usr/bin/perl
    use 5.012;
    use warnings;
    use Benchmark qw(cmpthese);
    my @strings = ('a b c', 'ab c d', 'a bcd e f gh', 'abc d');
    
    cmpthese( 0, {
        Eric_Storm  => sub{ for my $string (@strings) { $string =~ s{\b(\w) ((?: \s+ (\w)\b)+)}{$1 . join '', split m|\s+|, $2}gex; } },
        canavanin   => sub{ for my $string (@strings) { $string =~ s/\b(\w)\s+(?=\w\b)/$1/g; } },
        Alan_Moore  => sub{ for my $string (@strings) { $string =~ s/(?<=(?<!\pL)\pL) (?=\pL(?!\pL))//g; } },
        keep_uni    => sub{ for my $string (@strings) { $string =~ s/\PL\pL\K (?=\pL(?!\pL))//g; } },
        keep_asc    => sub{ for my $string (@strings) { $string =~ s/[^a-zA-Z][a-zA-Z]\K (?=[a-zA-Z](?![a-zA-Z]))//g; } },
        no_regex    => sub{ for my $string (@strings) { my @s; my $t = ''; 
        for my $el (split /\s+/, $string) {if (length $el > 1) { push @s, $t if $t; $t = ''; push @s, $el; } else { $t .= $el; } }
        push @s, $t if $t;
        #say "@s";
        } },
    });
    

    .

               Rate  no_regex Alan_Moore Eric_Storm canavanin  keep_uni keep_asc                                                                                                                                                             
    no_regex    98682/s        --       -64%       -65%      -66%      -81%     -87%                                                                                                                                                             
    Alan_Moore 274019/s      178%         --        -3%       -6%      -48%     -63%                                                                                                                                                             
    Eric_Storm 282855/s      187%         3%         --       -3%      -46%     -62%                                                                                                                                                             
    canavanin  291585/s      195%         6%         3%        --      -45%     -60%
    keep_uni   528014/s      435%        93%        87%       81%        --     -28%
    keep_asc   735254/s      645%       168%       160%      152%       39%       --
    
    0 讨论(0)
提交回复
热议问题