Read and cbind second column of multiple files in directory

牧云@^-^@ 提交于 2019-12-23 16:17:08

问题


I have 94 tab delimited files, no header, in a single directory '/path/' with gene names in the first column and counts in the second column. There are 23000 rows.

I would like to read all 94 files found in /path/ in to R and merge all of the 94 files to create a single data frame 'counts.table' where the first column contains the gene names (identical and in the same order in Column 1 of all 94 files) and second to ninety-fifth column contains the counts from each individual file (i.e. Column 2 of each of the 94 files, which are unique numbers). The final counts.table data frame will have 23000 rows and 95 columns.
Ideally like this:

 Column1 Column2 Column3 Column4... to column 95 
 gene a      0      4      3 
 gene b      4      9      9 
 gene c      3      0      8 
 ...
 to row 23000

Column2 contains counts from sample X, Column3 counts from sample Y, column 4 from sample Z, etc.

Do I have to read each file in to R individually and then merge them all by adding the second column of each file with cbind to create 'counts.table'? Thanks in advance.


回答1:


Too long for a comment.

Something like this should work.

# not tested
files <- list.files(path="./path")
genes <- read.table(files[1], header=FALSE, sep="\t")[,1]     # gene names
df    <- do.call(cbind,lapply(files,function(fn)read.table(fn,header=FALSE, sep="\t")[,2]))
df    <- cbind(genes,df)

list.files(...) grabs the names of all the files in the specified path into a vector. We then extract the gene names: column 1 of the first file (could be any of the files). We then build a list of data.frames using lapply(files, function(fn)...) which contains the second column of each file, and bind all these together column-wise using do.call(cbind, ...). Finally, we bind the gene names to the result.

Assumptions:

  1. The gene names are in the same order in all the files.
  2. All the files have exactly the same number of rows.
  3. The path directory has your gene files only.


来源:https://stackoverflow.com/questions/33072993/read-and-cbind-second-column-of-multiple-files-in-directory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!