Sampling without replacement using awk

前端 未结 4 607
孤街浪徒
孤街浪徒 2020-12-11 04:35

I have a lot of text files that look like this:

>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT
>HLGKAHOLAGGATACCATAGATGGCACGCCC         


        
4条回答
  •  有刺的猬
    2020-12-11 05:20

    Maybe it's better to sample the file using a fixed schema, like sampling one record each 10 lines. You can do that using this awk one-liner:

    awk '0==NR%10' filename
    

    If you want to sample a percentage of the total, then you can program a way to calculate the number of rows the awk one-liner should use so the number of records printed matches that quantity/percentage.

    I hope this helps!

提交回复
热议问题