Replacing multiple blank lines with one blank line using RegEx search and replace

后端 未结 10 1787
旧巷少年郎
旧巷少年郎 2020-12-13 09:31

I have a file that I need to reformat and remove \"extra\" blank lines.

I am using the Perl syntax regular expression search and replace functionality of UltraEdit a

相关标签:
10条回答
  • 2020-12-13 10:07

    It depends what the line endings are. Assuming \n, replace this:

    ([ \t]*\n){3,}
    

    with \n\n.

    0 讨论(0)
  • 2020-12-13 10:07

    For completeness I want to reference here the large post Remove / delete blank and empty lines in the user forums of UltraEdit which contains at bottom after all the explanations for newbies the solution for reducing two or more lines with nothing (empty lines) or just whitespaces (blank lines) to one empty line independent on line terminator type.

    And some words on what Alan Moore wrote in his answer:

    UltraEdit's Perl regular expression support is not crippled by its line-based architecture. Perl regular expression engines have a flag which determine if a dot matches all characters except newline characters like carriage return (CR) and line feed (LF) or really all characters including CR and LF. This makes the difference if a text file is interpreted as large byte stream or as a sequence of lines for Perl regular expression finds/replaces. In UltraEdit the flag is set by default to not include \r (CR) and \n (LF) by a dot in the regular expression search string. But this behavior can be easily changed in UltraEdit by starting the regular expression string with (?s) which changes the value of the flag match_not_dot_newline as posted in UltraEdit user forums at topic "." in Perl regular expressions doesn't include CRLFs?

    A Perl regular expression replace working for files with

    • carriage return + line feed (DOS/Windows) or
    • only line feed (Unix, Mac OS 10.0 and later versions) or
    • only carriage return (Mac OS 9 and previous versions)

    as line ending with optionally trailing spaces and tabs at end of a paragraph (one or more lines) and with two or more lines without (empty line) or with whitespaces (blank line) below the paragraph could be done with search string \h*(\r?\n|\r)(?:\h*\1){2,} and \1\1 as replace string.

    Explanation:

    \h* matches any horizontal whitespace character according to Unicode 0 or more times. This first part of the search expression matches horizontal whitespace characters at end of a line like horizontal tabs, normal spaces, no-break-spaces and some other not often used spaces.

    The usage of \s is not good as this character class matches any whitespace character including the vertical whitespace characters carriage return and line feed.

    (\r?\n|\r) ... is an OR expression with two arguments in a marking group. The first argument matches a line feed optionally with a preceding carriage return while the second argument matches just a carriage return. So this expression matches all three common types of line terminations completely correct. It is important for the rest of the search and the replace to match always either CR+LF (both together) or just LF or just CR.

    (?:\h*\1) ... is a non marking group which matches 0 or more horizontal whitespaces and the newline as found before back-referenced with \1, i.e. CR+LF or just LF or just CR. So this part of the expression finds an empty or blank line.

    {2,} ... is a multiplier for the previous expression in the non marking group which means at least two times. So after end of a paragraph there must be two or more empty or blank lines. Only one empty or blank line below a paragraph is not enough for a positive match of search expression.

    The replace string \1\1 references twice the first found line break.

    The advantage of this regular expression in comparison to the others posted here is that the line ending type must not be known. The search expression finds that out and found line ending is referenced in the replace string. And probably existing trailing whitespaces at end of a paragraph and whitespaces on next line are removed also by this regular expression replace if there are two or more empty or blank lines below a paragraph.

    {2,} can be replaced by + in search string if trimming whitespaces at end of a paragraph and on next empty or blank line should be also done on running this Perl regular expression replace. But please note that in this case the replace makes replaces which do not change anything at all if there are not trailing whitespaces at end of a paragraph and next line is an empty line.

    0 讨论(0)
  • 2020-12-13 10:12

    In Vim, Using

    :%!cat -s
    

    I find this is the easiest way to delete extra empty line so far.

    0 讨论(0)
  • 2020-12-13 10:15

    Replacing

    ^(\s*\r\n){2,}

    With

    \r\n

    Is what I ended up with.

    This only selects blank lines in multiples of two or more and replaces them with one.

    0 讨论(0)
提交回复
热议问题