问题
I have 94 tab delimited files, no header, in a single directory '/path/' with gene names in the first column and counts in the second column. There are 23000 rows.
I would like to read all 94 files found in /path/ in to R and merge all of the 94 files to create a single data frame 'counts.table' where the first column contains the gene names (identical and in the same order in Column 1 of all 94 files) and second to ninety-fifth column contains the counts from each individual file (i.e. Column 2 of each of the 94 files, which are unique numbers). The final counts.table data frame will have 23000 rows and 95 columns.
Ideally like this:
Column1 Column2 Column3 Column4... to column 95
gene a 0 4 3
gene b 4 9 9
gene c 3 0 8
...
to row 23000
Column2 contains counts from sample X, Column3 counts from sample Y, column 4 from sample Z, etc.
Do I have to read each file in to R individually and then merge them all by adding the second column of each file with cbind to create 'counts.table'? Thanks in advance.
回答1:
Too long for a comment.
Something like this should work.
# not tested
files <- list.files(path="./path")
genes <- read.table(files[1], header=FALSE, sep="\t")[,1] # gene names
df <- do.call(cbind,lapply(files,function(fn)read.table(fn,header=FALSE, sep="\t")[,2]))
df <- cbind(genes,df)
list.files(...)
grabs the names of all the files in the specified path into a vector. We then extract the gene names: column 1 of the first file (could be any of the files). We then build a list of data.frames using lapply(files, function(fn)...)
which contains the second column of each file, and bind all these together column-wise using do.call(cbind, ...)
. Finally, we bind the gene names to the result.
Assumptions:
- The gene names are in the same order in all the files.
- All the files have exactly the same number of rows.
- The path directory has your gene files only.
来源:https://stackoverflow.com/questions/33072993/read-and-cbind-second-column-of-multiple-files-in-directory