问题
I have repeating data as follows
....
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
5 4 3 16 22 247 0 40168 40911 40944 40205 40000 40562
6 4 4 17 154 93 309 0 40930 40919 40903 40917 40852 40000 40419
7 3 2 233 311 0 40936 40932 40874 40000 40807
....
This data is made up of 115 data blocks, and each data block have 4000 lines like that format. Here, I hope to put two new lines (number of line per data block = 4000 and empty line) at the begining of each data blocks, so it looks
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
3999 2 2 4079 4081 0 40873 40873 40746 40000 40634
4000 1 1 4080 0 40873 40923 40000 40345
4000
1 4 4 244 263 704 952 0 40936 40930 40934 40921 40820 40000 40570
2 4 4 215 172 305 33 0 40945 40942 40937 40580 40687 40000 40410
3 4 4 344 279 377 1945 0 40933 40915 40907 40921 40839 40000 40437
4 4 4 66 79 169 150 0 40928 40938 40923 40921 40789 40000 40498
...
Can I do this with awk or any other unix command?
回答1:
A simple one liner using awk
can do the purpose.
awk 'NR%4000==1{print "4000\n"} {print$0}' file
what it does.
print $0
prints every line.
NR%4000==1
selects the 4000
th line. When it occures it prints a 4000
and a newline \n
, that is two new lines.
NR
Number of records, which is effectivly number of lines reads so far.
simple test.
inserts 4000 at 5th line
awk 'NR%5==1{print "4000\n"} {print$0}'
output:
4000
1
2
3
4
5
4000
6
7
8
9
10
4000
11
12
13
14
15
4000
16
17
18
19
20
4000
回答2:
My solution is more general, since the blocks can be of non-equal lenght as long as you restart the 1st field counter to denote the beginning of a new block
% cat mark_blocks
$1<count { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
# executed for each line
{ l[$1] = $0; count=$1}
END { print count; print "";
for(i=1;i<=count;i++) print l[i]; }
% awk -f mark_blocks your_data > marked_data
%
The working is simple, awk accumulates lines in memory and it prints the header lines and the accumulated data when it reaches a new block or EOF.
The (modest) trick is that the output action must take place before we do the usual stuff we do for each line.
回答3:
You can do it all in bash :
cat $FILE | ( let countmax=4000; let count=countmax; while read lin ; do if [ $count == $countmax ]; then let count=0; echo -e "$countmax\n" ; fi ; echo $lin ; let count=count+1 ; done )
Here we assume you are reading this data from $FILE . Then all we are doing is reading from the file and piping it into our little bash script.
The bash script reads lines one by one (with the while read lin
) , and increments the counter count
for each line. When starting or when the counter count
reaches the value countmax
(set to 4000) , then it prints out the 2 lines you asked for.
来源:https://stackoverflow.com/questions/26320609/how-to-insert-two-lines-for-every-data-frame-using-awk