问题
To be more more specific. Lets say I have a character vector "names" with the following elements:
Names[1]<-"aaron, matt, patrick",
Names[2]<-"jiah, ron, melissa, john, patrick"
and so on......I have 22956 elements like this. I want to separate all the names and assign them a separate column in excel. How do I do this? It requires text mining. But I am not sure how to do this.
Thank you.
回答1:
I assume you have a list of strings elements separated by a comma, with different number of elements.
Names <- c("aaron, matt, patrick",
"jiah, ron, melissa, john, patrick")
## get max number of elements
mm <- mm <- max(unlist(lapply(strsplit(Names,','),length)))
## set all rows the same length
lapply(strsplit(Names,','),function(x) {length(x) <- mm;x})
## create a data frame with the data welle formatted
res <- do.call(rbind,lapply(strsplit(Names,','),function(x) {length(x) <- mm;x}))
## save the file
write.csv(res,'output.csv')
I think also you can use rbind.fill
from plyr package, but you have to coerce each row to a data.frame
( certain cost).
回答2:
Assuming the TDM does what you need, you should be able to coerce the TDM object into a matrix using the as.matrix
function and then export to csv as usual.
tdmMatrix <- as.matrix(myTDM)
write.csv(tdmMatrix, 'myfile.csv')
来源:https://stackoverflow.com/questions/16981599/how-to-convert-a-termdocumentmatrix-which-i-have-got-from-text-mining-in-r-into