I have very large genotype files that are basically impossible to open in R, so I am trying to extract the rows and columns of interest using linux command line. Rows are st
You can use cut with a delimiter like this:
with space delim:
cut -d " " -f1-100,1000-1005 infile.csv > outfile.csv
with tab delim:
cut -d$'\t' -f1-100,1000-1005 infile.csv > outfile.csv
I gave you the version of cut in which you can extract a list of intervals...
Hope it helps!
If the command should work with both tabs and spaces as the delimiter I would use awk
:
awk '{print $100,$101,$102,$103,$104,$105}' myfile > outfile
As long as you just need to specify 5 fields it is imo ok to just type them, for longer ranges you can use a for
loop:
awk '{for(i=100;i<=105;i++)print $i}' myfile > outfile
If you want to use cut
, you need to use the -f
option:
cut -f100-105 myfile > outfile
If the field delimiter is different from TAB
you need to specify it using -d
:
cut -d' ' -f100-105 myfile > outfile
Check the man page for more info on the cut command.