Best way to store variable-length data in an R data.frame?

前端 未结 5 1185
日久生厌
日久生厌 2021-02-06 02:59

I have some mixed-type data that I would like to store in an R data structure of some sort. Each data point has a set of fixed attributes which may be 1-d numeric, factors, or

5条回答
  •  走了就别回头了
    2021-02-06 03:38

    I would just use the data in the "long" format.

    E.g.

    > d1 <- data.frame(id=1:3, num_words=c(2,1,4), phrase=c("hello world", "greetings", "take me to your leader"))
    > d2 <- data.frame(id=c(rep(1,2), rep(2,1), rep(3,5)), token_length=c(5,5,9,4,2,2,4,6))
    > d2$tokenid <- with(d2, ave(token_length, id, FUN=seq_along))
    > d <- merge(d1,d2)
    > subset(d, nchar(phrase) > 10)
      id num_words                 phrase token_length tokenid
    1  1         2            hello world            5       1
    2  1         2            hello world            5       2
    4  3         4 take me to your leader            4       1
    5  3         4 take me to your leader            2       2
    6  3         4 take me to your leader            2       3
    7  3         4 take me to your leader            4       4
    8  3         4 take me to your leader            6       5
    > with(d, tapply(token_length, id, mean))
      1   2   3 
    5.0 9.0 3.6 
    

    Once the data is in the long format, you can use sqldf or plyr to extract what you want from it.

提交回复
热议问题