I have a special file with this kind of format :
title1
_1 texthere
title2
_2 texthere
I would like all newlines starting with \"_\" to be
This might work for you (GNU sed):
sed ':a;N;s/\n_/ /;ta;P;D' file
This avoids slurping the file into memory.
or:
sed -e ':a' -e 'N' -e 's/\n_/ /' -e 'ta' -e 'P' -e 'D' file
Try following solution:
In sed the loop is done creating a label (:a
), and while not match last line ($!
) append next one (N
) and return to label a
:
:a
$! {
N
b a
}
After this we have the whole file into memory, so do a global substitution for each _
preceded by a newline:
s/\n_/ _/g
p
All together is:
sed -ne ':a ; $! { N ; ba }; s/\n_/ _/g ; p' infile
That yields:
title1 _1 texthere
title2 _2 texthere
If your whole file is like your sample (pairs of lines), then the simplest answer is
paste - - < file
Otherwise
awk '
NR > 1 && /^_/ {printf "%s", OFS}
NR > 1 && !/^_/ {print ""}
{printf "%s", $0}
END {print ""}
' file
A Perl approach:
perl -00pe 's/\n_/ /g' file
Here, the -00
causes perl to read the file in paragraph mode where a "line" is defined by two consecutive newlines. In your example, it will read the entire file into memory and therefore, a simple global substitution of \n_
with a space will work.
That is not very efficient for very large files though. If your data is too large to fit in memory, use this:
perl -ne 'chomp;
s/^_// ? print "$l " : print "$l\n" if $. > 1;
$l=$_;
END{print "$l\n"}' file
Here, the file is read line by line (-n
) and the trailing newline removed from all lines (chomp
). At the end of each iteration, the current line is saved as $l
($l=$_
). At each line, if the substitution is successful and a _
was removed from the beginning of the line (s/^_//
), then the previous line is printed with a space in place of a newline print "$l "
. If the substitution failed, the previous line is printed with a newline. The END{}
block just prints the final line of the file.