Extracting columns from text file with different delimiters in Linux

前端 未结 2 1873
礼貌的吻别
礼貌的吻别 2020-12-05 02:28

I have very large genotype files that are basically impossible to open in R, so I am trying to extract the rows and columns of interest using linux command line. Rows are st

相关标签:
2条回答
  • 2020-12-05 02:55

    You can use cut with a delimiter like this:

    with space delim:

    cut -d " " -f1-100,1000-1005 infile.csv > outfile.csv
    

    with tab delim:

    cut -d$'\t' -f1-100,1000-1005 infile.csv > outfile.csv
    

    I gave you the version of cut in which you can extract a list of intervals...

    Hope it helps!

    0 讨论(0)
  • 2020-12-05 02:59

    If the command should work with both tabs and spaces as the delimiter I would use awk:

    awk '{print $100,$101,$102,$103,$104,$105}' myfile > outfile
    

    As long as you just need to specify 5 fields it is imo ok to just type them, for longer ranges you can use a for loop:

    awk '{for(i=100;i<=105;i++)print $i}' myfile > outfile
    

    If you want to use cut, you need to use the -f option:

    cut -f100-105 myfile > outfile
    

    If the field delimiter is different from TAB you need to specify it using -d:

    cut -d' ' -f100-105 myfile > outfile
    

    Check the man page for more info on the cut command.

    0 讨论(0)
提交回复
热议问题