Delete columns in text files with specific string

两盒软妹~` 提交于 2019-12-18 07:08:48

问题


I would like to delete collumns with a specific string "Gtype." from a .txt tab delimited file. I already have tried this command in R: df <- df[, -grep("GType.", colnames(df))] to do this task. However my matrix is too big (more than 13 GB), and R cannot deal with it. (Error: cannot allocate vector of size....)

My input file:

Log.NE122  Gtype.NE122  Log.NE144    Gtype.NE144
-0.33          AA          1.0           AB

My expected output:

   Log.NE122  Log.NE144  
    -0.33       1.0      

I am wondering that it works in bash. If someone have other options....


回答1:


Using awk:

awk 'NR==1{for (i=1; i<=NF; i++) if ($i ~ /Gtype/) a[i]; 
     else printf "%s%s", $i, OFS; print ""; next}
     {for (i=1; i<=NF; i++) if (!(i in a)) printf "%s%s", $i, OFS; print "" }' file
Log.NE122 Log.NE144
-0.33     1.0



回答2:


You can also try using the 'data.table' package and assign the columns NULL:

dt <- data.table(df)
dt[, colToDelete := NULL]

"data.table" tries to do most of its operations without having to make copies. The way that you are doing it on data.frames requires a copy to be made.



来源:https://stackoverflow.com/questions/23130502/delete-columns-in-text-files-with-specific-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!