Perl Regex To Condense Multiple Line Breaks

前端 未结 4 1922
醉话见心
醉话见心 2021-01-14 10:24

I can\'t seem to figure out the right syntax but I want a Perl regular expression to find where there are two or more line breaks in a row and condense them into just 2 line

相关标签:
4条回答
  • 2021-01-14 10:38

    If you're using Perl 5.10 or later, try this:

    $string =~ s/(\R)(?:\h*\R)+/$1$1/g;
    

    \R is the generic line-separator escape sequence (ref), and \h matches any horizontal whitespace character (e.g. space and TAB) (ref). So this will convert any sequence of one or more blank lines to one empty line.

    Most applications these days are liberal in what they'll recognize as a line separator; they'll even accept a mix of two or more styles of separator in the same document. On the other hand, some apps actively convert all line separators to one preferred style. But sometimes you do have to stick to one particular style; that's why I captured the first \R match and used it as the replacement, instead of arbitrarily using \n.

    Be aware that these special escape sequences aren't widely supported in other regex flavors. They work in recent versions of PHP, and \R seems to work in Ruby 2.0, though I can't find any doc that mentions it. Ruby 1.9.2 and 2.0 support a \h escape sequence, but it matches a hexadecimal digit ([0-9a-fA-F]), not horizontal whitespace. In most other flavors, \R and \h will either throw an exception or match a literal R and h respectively.

    0 讨论(0)
  • 2021-01-14 10:46

    Show a full example. What is $string?

    $ perl -E'my $s = qq{a\n\n\nb}; say "[$s]"; $s =~ s/\n\n+/\n\n/g; say "[$s]"'
    [a
    
    
    b]
    [a
    
    b]
    
    0 讨论(0)
  • 2021-01-14 11:01

    This does it:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    my $string;
    {
       local $/=undef;
       $string =<DATA>;
    } 
    print "Before:\n$string\n============";
    
    $string=~s/\n{2,}/\n\n/g;
    print "After:\n$string\n\nBye Bye!";
    
    __DATA__
    Line 1
    Line 2
    
    
    
    
    
    
    Line 9
    Line 10
    
    Line 12
    
    
    
    Line 16
    
    
    Line 19
    

    Output:

    Before:
    Line 1
    Line 2
    
    
    
    
    
    
    Line 9
    Line 10
    
    Line 12
    
    
    
    Line 16
    
    
    Line 19
    ============After:
    Line 1
    Line 2
    
    Line 9
    Line 10
    
    Line 12
    
    Line 16
    
    Line 19
    

    Perl also supports the \R character class for platform independence. See this SO link. Your regex would then be s/\R{2,}/\n\n/g;

    0 讨论(0)
  • 2021-01-14 11:01

    @btilly hit the nail on the head. I did a quick test case:

    in:

    a
    
    b
    
    
    
    
    c
    

    with this code:

    my $line = join '', <>;
    $line =~ s{\n\n+}{\n\n}g;
    print $line;
    

    and it returned the expected result:

    a
    
    b
    
    c
    

    You can get the same result by changing the record separator (and avoiding the regex):

    {
        # change the Record Separator from "\n" to ""
        # treats multiple newlines as just one (perldoc perlvar)
        # local limits the change to the global $/ to this block
        local $/ = "";
        print <>;
    }
    
    0 讨论(0)
提交回复
热议问题