Combine frequency tables into a single data frame

前端未结

关注

 3  1812

I have a list in which each list item is a word frequency table derived from using \"table()\" on a different sample text. Each table is, therefore, a different length. I want

相关标签:

3条回答

傲寒

2021-02-04 19:41

freqs.list <- mapply(data.frame,Words=seq_along(myList),myList,SIMPLIFY=FALSE,MoreArgs=list(stringsAsFactors=FALSE))
freqs.df <- do.call(rbind,freqs.list)
res <- reshape(freqs.df,timevar="Words",idvar="Var1",direction="wide")
head(res)

0 讨论(0)

南方客

2021-02-04 20:01

Here is an inelegant way that gets the job done. I'm sure there's a 1-liner out there just for this, but I dunno where either:

    myList <- list(t1=t1, t2=t2, t3=t3)
    myList <- lapply(myList,as.data.frame,stringsAsFactors = FALSE)
    Words <- unique(unlist(lapply(myList,function(x) x[,1])))
    DFmerge <- data.frame(Words=Words)
    for (i in 1:3){
        DFmerge <- merge(DFmerge,myList[[i]],by.x="Words",by.y="Var1",all.x=TRUE)
    }
    colnames(DFmerge) <- c("Words","t1","t2","t3")

And looking around a bit more, here's another way that gives output more similar to that in the linked blog post: [Edit: works now]

    myList <- list(t1=t1, t2=t2, t3=t3)
    myList <- lapply(myList,function(x) {
        A <- as.data.frame(matrix(unlist(x),nrow=1))
        colnames(A) <- names(x)
        A[,colnames(A) != ""]
        }
    )   
    do.call(rbind.fill,myList)

Also ugly, so maybe a better answer will still come along.

0 讨论(0)

一生所求

2021-02-04 20:07

1. zoo. The zoo package has a multiway merge function which can do this compactly. The lapply converts each component of myList to a zoo object and then we simply merge them all:

# optionally add nice names to the list
names(myList) <- paste("t", seq_along(myList), sep = "")

library(zoo)
fz <- function(x)with(as.data.frame(x, stringsAsFactors=FALSE), zoo(Freq, Var1)))
out <- do.call(merge, lapply(myList, fz))

The above returns a multivariate zoo series in which the "times" are "a", "ago", etc. but if a data frame result were desired then its just a matter of as.data.frame(out).

2. Reduce. Here is a second solution. It uses Reduce in the core of R.

merge1 <- function(x, y) merge(x, y, by = 1, all = TRUE)
out <- Reduce(merge1, lapply(myList, as.data.frame, stringsAsFactors = FALSE))

# optionally add nice names
colnames(out)[-1] <- paste("t", seq_along(myList), sep = "")

3. xtabs. This one adds names to the list and then extracts the frequencies, names and groups as one long vector each putting them back together using xtabs:

names(myList) <- paste("t", seq_along(myList))

xtabs(Freq ~ Names + Group, data.frame(
    Freq = unlist(lapply(myList, unname)),
    Names = unlist(lapply(myList, names)),
    Group = rep(names(myList), sapply(myList, length))
))

Benchmark

Benchmarking some of the solutions using the rbenchmark package we get the following which indicates that the zoo solution is the fastest on the sample data and is arguably the simplest as well.

> t1<-table(strsplit(tolower("this is a test in the event of a real word file you would see many more words here"), "\\W"))
> t2<-table(strsplit(tolower("Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal"), "\\W"))
> t3<-table(strsplit(tolower("Ask not what your country can do for you - ask what you can do for your country"), "\\W"))
> myList <- list(t1, t2, t3)
> 
> library(rbenchmark)
> library(zoo)
> names(myList) <- paste("t", seq_along(myList), sep = "")
> 
> benchmark(xtabs = {
+ names(myList) <- paste("t", seq_along(myList))
+ xtabs(Freq ~ Names + Group, data.frame(
+ Freq = unlist(lapply(myList, unname)),
+ Names = unlist(lapply(myList, names)),
+ Group = rep(names(myList), sapply(myList, length))
+ ))
+ },
+ zoo = {
+ fz <- function(x) with(as.data.frame(x, stringsAsFactors=FALSE), zoo(Freq, Var1))
+ do.call(merge, lapply(myList, fz))
+ },
+ Reduce = {
+ merge1 <- function(x, y) merge(x, y, by = 1, all = TRUE)
+ Reduce(merge1, lapply(myList, as.data.frame, stringsAsFactors = FALSE))
+ },
+ reshape = {
+ freqs.list <- mapply(data.frame,Words=seq_along(myList),myList,SIMPLIFY=FALSE,MoreArgs=list(stringsAsFactors=FALSE))
+ freqs.df <- do.call(rbind,freqs.list)
+ reshape(freqs.df,timevar="Words",idvar="Var1",direction="wide")
+ }, replications = 10, order = "relative", columns = c("test", "replications", "relative"))
     test replications relative
2     zoo           10 1.000000
4 reshape           10 1.090909
1   xtabs           10 1.272727
3  Reduce           10 1.272727

ADDED: second solution.

ADDED: third solution.

ADDED: benchmark.

0 讨论(0)