I have a lot of text files that look like this:
>ALGKAHOLAGGATACCATAGATGGCACGCCCT
>BLGKAHOLAGGATACCATAGATGGCACGCCCT
>HLGKAHOLAGGATACCATAGATGGCACGCCC
Maybe it's better to sample the file using a fixed schema, like sampling one record each 10 lines. You can do that using this awk
one-liner:
awk '0==NR%10' filename
If you want to sample a percentage of the total, then you can program a way to calculate the number of rows the awk
one-liner should use so the number of records printed matches that quantity/percentage.
I hope this helps!