How to use awk for a compressed file

后端 未结 3 1412
谎友^
谎友^ 2020-12-29 22:11

How can I change the following command for a compressed file?

awk \'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 \";\" array[$1,$2] }\' inpu         


        
相关标签:
3条回答
  • 2020-12-29 22:23
    zcat FILE | awk '{ ...}'
    

    I wouldn't be able to tell which of all these methods works best, zcat is at least quicker to type ;)

    0 讨论(0)
  • 2020-12-29 22:47

    You need to read them compressed files like this:

    awk '{ ... }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz)
    

    Try this:

    awk 'FNR==NR { sub(/AA=\.;/,""); array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz) | gzip > output.vcf.gz
    
    0 讨论(0)
  • 2020-12-29 22:47
    bzip2 -dc input1.vcf.bz2 input2.vcf.bz2 | awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }'
    

    or

    gzip -dc input1.vcf.gz input2.vcf.gz | awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }'
    

    EDIT:

    To write compressed output just append

    | bzip2 >output.vcf.bz2
    

    or

    | gzip >output.vcf.gz
    

    This will work with any program that prints results to standard output.

    BTW: Editing such large command lines gets tedious very quickly. You should consider writing a small shell script to do the job. This has the additional benefit that you don't have to remember the entire thing and can easily repeat the command or modify it if necessary.

    A good starting point for Linux shell programming is the Bash Programming Inroduction by Mike G.

    0 讨论(0)
提交回复
热议问题