Using awk (or sed) to remove newlines based on first character of next line

前端 未结 5 2069
陌清茗
陌清茗 2021-02-15 14:23

here\'s my situation: I had a big text file that I wanted to pull certain information from. I used sed to pull all the relevant information based on regexp\'s, but each \"piece\

相关标签:
5条回答
  • 2021-02-15 14:38
    $ perl -0pe 's/\n,/,/g' < test.dat
    92831,499,000,0644321
    79217,999,000,5417178,PK91622,PK90755
    

    Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

    Shortest code here!

    0 讨论(0)
  • 2021-02-15 14:42
    sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename
    
    0 讨论(0)
  • 2021-02-15 14:42

    This might work for you:

    # sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
    92831,499,000,0644321
    79217,999,000,5417178,PK91622
    79217,999,000,5417178,PK90755
    

    Explanation:

    This comes in two parts:

    Append the next line and then if the appended line begins with a , , delete the embedded new line \n and start again. If not print upto the newline and then delete upto the new line. Repeat.

    Replace the 5th , with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.

    0 讨论(0)
  • 2021-02-15 14:51

    Without special-casing field 3, easy.

    awk '
        !/^,/   { if (NR > 1) print x ; x = $0 }
        /^,/    { x = x OFS $0 }
        END     { if (NR) print x }
    '
    

    With, more complex but still not too hard.

    awk '
        !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
        /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
        END     { if (n && n < 3) print x }
    '
    
    0 讨论(0)
  • 2021-02-15 15:04

    Well, guess I should have taken a closer look at using Records in awk when I was trying to figure this out last night... 10 minutes after looking at them I got it working. For anyone interested here's how I did this: In my original sed script I put an extra newline infront of the beginning of each record so there's now a blank line seperating each one. I then use the following awk command:

    awk 'BEGIN {RS = ""; FS = "\n"}
    {
    if (NF >= 3)
    for (i = 3; i <= NF; i++)
    print $1,$2,$i
    }'

    and it works like a charm outputting exactly the way I wanted!

    0 讨论(0)
提交回复
热议问题