How do I join pairs of consecutive lines in a large file (1 million lines) using vim, sed, or another similar tool?

橙三吉。 提交于 2019-12-22 03:18:05

问题


I need to move the contents of every second line up to the line above such that line2's data is alongside line1's, either comma or space separated works.

Input:

line1
line2
line3
line4

Output:

line1 line2
line3 line4

I've been doing it in vim with a simple recording but vim seems to crash when I tell it to do it 100 000 times... I'm thinking maybe sed would be a good alternative but not sure how to do what I want or maybe there's a better option?

Each line only contains 1 numerical value, I just have a million lines...


回答1:


If I understand correctly, you have:

line1 
line2
line3
line4
...

and you want:

line1<SEP>line2
line3<SEP>line4

then you can do it easily with (g)awk like this:

awk 'NR % 2 == 1 { o=$0 ; next } { print o "<sep>" $0 }' INPUTFILE

See it in action here.

Update: if the number of lines is odd, the above will omit the last line (as Martin Stettner pointed out) so this will not:

awk 'NR % 2 == 1 { o=$0 ; next } { print o "<sep>" $0 } END { if ( NR % 2 == 1 ) { print o } }' INPUTFILE

HTH




回答2:


The paste command can do this. Its "-s" option will join consecutive lines; and the "-d" option specifies a list of characters to use as delimiters, repeating them cyclically. Join first with a space, then with a newline, and repeat:

seq 10 | paste -sd" \n" -



回答3:


try this:

sed -rn 'N;s/\n/ /;p' yourFile

test with seq:

kent$  seq 10
1
2
3
4
5
6
7
8
9
10

kent$  seq 10|sed -rn 'N;s/\n/ /;p'
1 2
3 4
5 6
7 8
9 10

awk works too:

awk 'NR%2{printf $0" ";next;}1' yourFile

test

kent$  seq 10|awk 'NR%2{printf $0" ";next;}1'
1 2
3 4
5 6
7 8
9 10



回答4:


Well your example is this in Vim.

:g/^/+t.|-j

But then what about the last line?

Or did you mean this?

:g/^/j

You might also be interested in this Vim script, which makes dealing with large files easier.

http://www.vim.org/scripts/script.php?script_id=1506




回答5:


This might work for you:

sed 'N;s/\n/ /' file

Or

cat file | paste -d' ' - - 

Or another couple of ways for the above:

paste -d\  - - <file

paste -sd' \n' file



回答6:


$ seq 10 | sed '2~2G' | awk -v RS='' '{$1=$1; print}'
1 2
3 4
5 6
7 8
9 10

$ paste -d' ' <(sed -n 'p;n' num.txt) <(sed -n 'n;p' num.txt)
1 2
3 4
5 6
7 8
9 10

$ echo -e 'g/^/,+1j\n%p' | ex num.txt
1 2
3 4
5 6
7 8
9 10

$ seq 10 | awk 'NR%2{printf("%s ", $0); next}1'
1 2
3 4
5 6
7 8
9 10

$ seq 10 | sed 'N;s/\n/ /'
1 2
3 4
5 6
7 8
9 10

note: $ seq 10 >num.txt




回答7:


seq 10 | awk 'ORS=NR%2?FS:RS'

This solution uses "ternary operator" to set ORS

ORS= ....... output register separator (will receive =)
NR%2 ....... test if it has division remainder of Number of Register by 2
?FS:RS ..... FS = "space" RS = "\n" (newline)



回答8:


You may utilize xargs for this. Usually xargs takes as many input elements as possible and executes some command with the elements read as arguments. E.g.

cat file | xargs

would echo

line1 line2 line3 line4

But you can limit the number of lines read with option -n

cat file | xargs -n 2

will have the desired effect of joining every two lines:

line1 line2
line3 line4

If the lines may contain white space, you must specify the input delimiter (newline) explicitly

cat file | xargs -n 2 -d '\n'

And finally, don't use cat this way, say instead

xargs -n 2 -d '\n' <file

or even

xargs -n 2 -d '\n' -a file


来源:https://stackoverflow.com/questions/8545538/how-do-i-join-pairs-of-consecutive-lines-in-a-large-file-1-million-lines-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!