Remove newline depending on the format of the next line

前端 未结 4 858
抹茶落季
抹茶落季 2021-01-14 08:34

I have a special file with this kind of format :

title1
_1 texthere
title2
_2 texthere

I would like all newlines starting with \"_\" to be

相关标签:
4条回答
  • 2021-01-14 09:09

    This might work for you (GNU sed):

    sed ':a;N;s/\n_/ /;ta;P;D' file
    

    This avoids slurping the file into memory.

    or:

    sed -e ':a' -e 'N' -e 's/\n_/ /' -e 'ta' -e 'P' -e 'D' file
    
    0 讨论(0)
  • 2021-01-14 09:16

    Try following solution:

    In sed the loop is done creating a label (:a), and while not match last line ($!) append next one (N) and return to label a:

    :a
    $! {
      N
      b a
    }
    

    After this we have the whole file into memory, so do a global substitution for each _ preceded by a newline:

    s/\n_/ _/g
    p
    

    All together is:

    sed -ne ':a ; $! { N ; ba }; s/\n_/ _/g ; p' infile
    

    That yields:

    title1 _1 texthere
    title2 _2 texthere
    
    0 讨论(0)
  • 2021-01-14 09:19

    If your whole file is like your sample (pairs of lines), then the simplest answer is

    paste - - < file
    

    Otherwise

    awk '
        NR > 1 &&  /^_/ {printf "%s", OFS} 
        NR > 1 && !/^_/ {print ""} 
        {printf "%s", $0} 
        END {print ""}
    ' file 
    
    0 讨论(0)
  • 2021-01-14 09:23

    A Perl approach:

    perl -00pe 's/\n_/ /g' file 
    

    Here, the -00 causes perl to read the file in paragraph mode where a "line" is defined by two consecutive newlines. In your example, it will read the entire file into memory and therefore, a simple global substitution of \n_ with a space will work.

    That is not very efficient for very large files though. If your data is too large to fit in memory, use this:

    perl -ne 'chomp; 
              s/^_// ? print "$l " : print "$l\n" if $. > 1; 
              $l=$_; 
              END{print "$l\n"}' file 
    

    Here, the file is read line by line (-n) and the trailing newline removed from all lines (chomp). At the end of each iteration, the current line is saved as $l ($l=$_). At each line, if the substitution is successful and a _ was removed from the beginning of the line (s/^_//), then the previous line is printed with a space in place of a newline print "$l ". If the substitution failed, the previous line is printed with a newline. The END{} block just prints the final line of the file.

    0 讨论(0)
提交回复
热议问题