Filling space/tab separated, empty columns with 0

前端 未结 5 2007
天命终不由人
天命终不由人 2021-01-07 02:06

i have a huge file and as an output some columns doesn\'t have a value, i need to fill these columns with 0 for further analysis. I can separate the columns with space or ta

相关标签:
5条回答
  • 2021-01-07 02:25

    Here's a sed solution. Note that some versions of sed don't like \t.

    sed 's/^\t/0\t/;:a;s/\t\t/\t0\t/g;ta;s/\t$/\t0/' inputfile
    

    or

    sed -e 's/^\t/0\t/' -e ':a' -e 's/\t\t/\t0\t/g' -e 'ta' -e 's/\t$/\t0/' inputfile
    

    Explanation:

    s/^\t/0\t/    # insert a zero before a tab that begins a line
    :a            # top of the loop
        s/\t\t/\t0\t/g    # insert a zero between a pair of tabs
    ta            # if a substitution was made, branch to the top of the loop
    s/\t$/\t0/    # insert a zero after a tab that ends a line
    
    0 讨论(0)
  • 2021-01-07 02:28

    This is really a job for a CSV parser, but if it has to be a regex, and you never have tabs within quoted CSV entries, you could search for

    (^|\t)(?=\t|$)
    

    and replace with

    $10
    

    So, in Perl:

    (ResultString = $subject) =~ 
    s/(    # Match either...
       ^   # the start of the line (preferably)
       |   # or
       \t  # a tab character
      )    # remember the match in backreference no. 1
      (?=  # Then assert that the next character is either
       \t  # a(nother) tab character
       |   # or
       $   # the end of the line
      )    # End of lookahead assertion
    /${1}0/xg;
    

    This will transform

    1   2       4           7   8
        2   3       5   6   7   
    

    into

    1   2   0   4   0   0   7   8   
    0   2   3   0   5   6   7   0
    
    0 讨论(0)
  • 2021-01-07 02:34

    Deleting my answer after re-reading the original post. There are no tabs as data, just delimeters. If there is no data, a double delimeter will apear to align the columns.
    It can't be any other way. So if a single delimeter is there, it will separate two empty fields. "" = 1 empty field, "\t" = 2 empty fields. I got it now.

    Tim Pietzcker has the correct answer all along. +1 for him.
    It could be written alternatively as s/ (?:^|(?<=\t)) (?=\t|$) /0/xg;, but its the same thing.

    0 讨论(0)
  • 2021-01-07 02:34

    If and only if your data only contains numbers and you have clear defined field separator FS, you can use the following trick:

    awk 'BEGIN{FS=OFS="\t"}{for(i=1;i<=NF;++i) $i+=0}1' file
    

    By adding zero, we convert strings to numbers. Empty strings will be converted to the number zero. You can define your field separator to anything you like.

    This, however, might be a bit slow since it will reparse $0 and split it into fields, every time you reassign a field $i.

    A faster way is the solution of Dennis Williamson

    0 讨论(0)
  • 2021-01-07 02:45

    For a tab-separated file, this AWK snippet does the trick:

    BEGIN { FS = "\t"; OFS="\t" }
    {
        for(i = 1; i <= NF; i++) {
             if(!$i) { $i = 0 }
        }
        print $0
    }
    
    0 讨论(0)
提交回复
热议问题