Using awk (or sed) to remove newlines based on first character of next line

前端未结

关注

 5  2069

here\'s my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp\'s, but each \"piece\

相关标签:

5条回答

终归单人心

2021-02-15 14:38
```
$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755
```
Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

Shortest code here!
0 讨论(0)
发布评论:

提交评论
- 加载中...

暗喜

2021-02-15 14:42

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

0 讨论(0)

盖世英雄少女心

2021-02-15 14:42
This might work for you:
```
# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/$.*,$.*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755
```
Explanation:

This comes in two parts:

Append the next line and then if the appended line begins with a , , delete the embedded new line \n and start again. If not print upto the newline and then delete upto the new line. Repeat.

Replace the 5th , with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.
0 讨论(0)
发布评论:

提交评论
- 加载中...

暗喜

2021-02-15 14:51

Without special-casing field 3, easy.

awk '
    !/^,/   { if (NR > 1) print x ; x = $0 }
    /^,/    { x = x OFS $0 }
    END     { if (NR) print x }
'

With, more complex but still not too hard.

awk '
    !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
    /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
    END     { if (n && n < 3) print x }
'

0 讨论(0)

一向

2021-02-15 15:04

Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

awk 'BEGIN {RS = ""; FS = "\n"}
{
if (NF >= 3)
for (i = 3; i <= NF; i++)
print $1,$2,$i
}'

and it works like a charm outputting exactly the way I wanted!

0 讨论(0)
发布评论:

提交评论
- 加载中...