Best way to store variable-length data in an R data.frame?

前端未结

关注

 5  1194

日久生厌 2021-02-06 02:59

I have some mixed-type data that I would like to store in an R data structure of some sort. Each data point has a set of fixed attributes which may be 1-d numeric, factors, or

5条回答

走了就别回头了 (楼主)

2021-02-06 03:38

I would just use the data in the "long" format.

E.g.

> d1 <- data.frame(id=1:3, num_words=c(2,1,4), phrase=c("hello world", "greetings", "take me to your leader"))
> d2 <- data.frame(id=c(rep(1,2), rep(2,1), rep(3,5)), token_length=c(5,5,9,4,2,2,4,6))
> d2$tokenid <- with(d2, ave(token_length, id, FUN=seq_along))
> d <- merge(d1,d2)
> subset(d, nchar(phrase) > 10)
  id num_words                 phrase token_length tokenid
1  1         2            hello world            5       1
2  1         2            hello world            5       2
4  3         4 take me to your leader            4       1
5  3         4 take me to your leader            2       2
6  3         4 take me to your leader            2       3
7  3         4 take me to your leader            4       4
8  3         4 take me to your leader            6       5
> with(d, tapply(token_length, id, mean))
  1   2   3 
5.0 9.0 3.6

Once the data is in the long format, you can use sqldf or plyr to extract what you want from it.

0 讨论(0)

查看其它5个回答