Best way to store variable-length data in an R data.frame?

前端 未结 5 1179
日久生厌
日久生厌 2021-02-06 02:59

I have some mixed-type data that I would like to store in an R data structure of some sort. Each data point has a set of fixed attributes which may be 1-d numeric, factors, or

5条回答
  •  我在风中等你
    2021-02-06 03:45

    Trying to shoehorn the data into a data frame seems hackish to me. Far better to consider each row as an individual object, then think of the dataset as an array of these objects.

    This function converts your data strings to an appropriate format. (This is S3 style code; you may prefer to use one of the 'proper' object oriented systems.)

    as.mydata <- function(x)
    {
       UseMethod("as.mydata")
    }
    
    as.mydata.character <- function(x)
    {
       convert <- function(x)
       {
          md <- list()
          md$phrase = x
          spl <- strsplit(x, " ")[[1]]
          md$num_words <- length(spl)
          md$token_lengths <- nchar(spl)
          class(md) <- "mydata"
          md
       }
       lapply(x, convert)
    }
    

    Now your whole dataset looks like

    mydataset <- as.mydata(c("hello world", "greetings", "take me to your leader"))
    
    mydataset
    [[1]]
    $phrase
    [1] "hello world"
    
    $num_words
    [1] 2
    
    $token_lengths
    [1] 5 5
    
    attr(,"class")
    [1] "mydata"
    
    [[2]]
    $phrase
    [1] "greetings"
    
    $num_words
    [1] 1
    
    $token_lengths
    [1] 9
    
    attr(,"class")
    [1] "mydata"
    
    [[3]]
    $phrase
    [1] "take me to your leader"
    
    $num_words
    [1] 5
    
    $token_lengths
    [1] 4 2 2 4 6
    
    attr(,"class")
    [1] "mydata"
    

    You can define a print method to make this look prettier.

    print.mydata <- function(x)
    {
       cat(x$phrase, "consists of", x$num_words, "words, with", paste(x$token_lengths, collapse=", "), "letters.")
    }
    mydataset
    [[1]]
    hello world consists of 2 words, with 5, 5 letters.
    [[2]]
    greetings consists of 1 words, with 9 letters.
    [[3]]
    take me to your leader consists of 5 words, with 4, 2, 2, 4, 6 letters.
    

    The sample operations you wanted to do are fairly straightforward with data in this format.

    sapply(mydataset, function(x) nchar(x$phrase) > 10)
    [1]  TRUE FALSE  TRUE
    

提交回复
热议问题