How to keep only certain numbers in a giant number in bash?

本小妞迷上赌 提交于 2019-12-24 03:23:09

问题


I have a huge file that contains genotypes. Basically, one line is a loci (or a SNP) and a column (all the numbers are concatenated together to form one giant number, but one number align in one column is 1 individual. In this example I have 96 number in a row so 96 individuals). Here is an example:

921212922222222212292222229222221222211222222222222219929222292222922229919922222222222222292292
929111221111111221191211222912222221111210229921222129929222291221921219929992122122222211292299
292222922212222122292222222222921122222222921219222222912222299199922222912222222222221222292229
222222221122122922122222112212212221222122221922999229222229222212992221222222221222222222222212
222222222292212221291112192222122121922122222122229212222221212212922221222122122912222922222111
222222921222222922292222122222922222229222122291299122922222229222922229229222219222292222122222

I want to keep only certain "columns here", but since it's one number, I would need to cut it, put it in different columns and concatenate everything to have the same format but with the 2 columns that I need.

For example, if I select column 1 and 3 the end result should be:

91
99
22
22
22
22

I've tried this (the data above is in output.geno):

cat ~/Desktop/output.geno| awk '{ print $1 $3}'
echo ~/Desktop/output.geno | grep -o ""

If you want to play with that here is a toy dataset:

echo "921212922222222212292222229222221222211222222222222219929222292222922229919922222222222222292292
929111221111111221191211222912222221111210229921222129929222291221921219929992122122222211292299
292222922212222122292222222222921122222222921219222222912222299199922222912222222222221222292229
222222221122122922122222112212212221222122221922999229222229222212992221222222221222222222222212
222222222292212221291112192222122121922122222122229212222221212212922221222122122912222922222111
222222921222222922292222122222922222229222122291299122922222229222922229229222219222292222122222" > ~/Desktop/output.geno

回答1:


You can use cut:

cut -c 1,3 output.geno

Gives:

91
99
22
22
22
22



回答2:


Try:

awk '{print $1$3}' FS= output.geno

Equivalent to:

awk 'BEGIN{FS=""}{print $1$3}' output.geno

You need to set FS (Field Separator) to null




回答3:


@M. Beausoleil, @Try(haven't tested it though):

awk '{print substr($0,1,1) substr($0,3,1)}'   Input_file

Simple, take out the 1st and 3rd digit and print it.



来源:https://stackoverflow.com/questions/42350243/how-to-keep-only-certain-numbers-in-a-giant-number-in-bash

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!