Removing trailing / starting newlines with sed, awk, tr, and friends

后端 未结 16 682
一个人的身影
一个人的身影 2021-01-30 21:04

I would like to remove all of the empty lines from a file, but only when they are at the end/start of a file (that is, if there are no non-empty lines before them, at the start;

相关标签:
16条回答
  • 2021-01-30 21:23

    As mentioned in another answer, tac is part of coreutils, and reverses a file. Combining the idea of doing it twice with the fact that command substitution will strip trailing new lines, we get

    echo "$(echo "$(tac "$filename")" | tac)"
    

    which doesn't depend on sed. You can use echo -n to strip the remaining trailing newline off.

    0 讨论(0)
  • 2021-01-30 21:25

    Using bash

    $ filecontent=$(<file)
    $ echo "${filecontent/$'\n'}"
    
    0 讨论(0)
  • 2021-01-30 21:25

    For an efficient non-recursive version of the trailing newlines strip (including "white" characters) I've developed this sed script.

    sed -n '/^[[:space:]]*$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^[[:space:]]*$/H'
    

    It uses the hold buffer to store all blank lines and prints them only after it finds a non-blank line. Should someone want only the newlines, it's enough to get rid of the two [[:space:]]* parts:

    sed -n '/^$/ !{x;/\n/{s/^\n//;p;s/.*//;};x;p;}; /^$/H'
    

    I've tried a simple performance comparison with the well-known recursive script

    sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba'
    

    on a 3MB file with 1MB of random blank lines around a random base64 text.

    shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M > bigfile
    base64 </dev/urandom | dd bs=1 count=1M >> bigfile
    shuf -re 1 2 3 | tr -d "\n" | tr 123 " \t\n" | dd bs=1 count=1M >> bigfile
    

    The streaming script took roughly 0.5 second to complete, the recursive didn't end after 15 minutes. Win :)

    For completeness sake of the answer, the leading lines stripping sed script is already streaming fine. Use the most suitable for you.

    sed '/[^[:blank:]]/,$!d'
    sed '/./,$!d'
    
    0 讨论(0)
  • 2021-01-30 21:25

    Here's an awk version that removes trailing blank lines (both empty lines and lines consisting of nothing but white space).

    It is memory efficient; it does not read the entire file into memory.

    awk '/^[[:space:]]*$/ {b=b $0 "\n"; next;} {printf "%s",b; b=""; print;}'
    

    The b variable buffers up the blank lines; they get printed when a non-blank line is encountered. When EOF is encountered, they don't get printed. That's how it works.

    If using gnu awk, [[:space:]] can be replaced with \s. (See full list of gawk-specific Regexp Operators.)

    If you want to remove only those trailing lines that are empty, see @AndyMortimer's answer.

    0 讨论(0)
  • 2021-01-30 21:32

    From Useful one-line scripts for sed:

    # Delete all leading blank lines at top of file (only).
    sed '/./,$!d' file
    
    # Delete all trailing blank lines at end of file (only).
    sed -e :a -e '/^\n*$/{$d;N;};/\n$/ba' file
    

    Therefore, to remove both leading and trailing blank lines from a file, you can combine the above commands into:

    sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba' file
    
    0 讨论(0)
  • 2021-01-30 21:34

    A bash solution.

    Note: Only useful if the file is small enough to be read into memory at once.

    [[ $(<file) =~ ^$'\n'*(.*)$ ]] && echo "${BASH_REMATCH[1]}"
    
    • $(<file) reads the entire file and trims trailing newlines, because command substitution ($(....)) implicitly does that.
    • =~ is bash's regular-expression matching operator, and =~ ^$'\n'*(.*)$ optionally matches any leading newlines (greedily), and captures whatever comes after. Note the potentially confusing $'\n', which inserts a literal newline using ANSI C quoting, because escape sequence \n is not supported.
    • Note that this particular regex always matches, so the command after && is always executed.
    • Special array variable BASH_REMATCH rematch contains the results of the most recent regex match, and array element [1] contains what the (first and only) parenthesized subexpression (capture group) captured, which is the input string with any leading newlines stripped. The net effect is that ${BASH_REMATCH[1]} contains the input file content with both leading and trailing newlines stripped.
    • Note that printing with echo adds a single trailing newline. If you want to avoid that, use echo -n instead (or use the more portable printf '%s').
    0 讨论(0)
提交回复
热议问题