Creating an R dataframe row-by-row

后端 未结 8 739
天涯浪人
天涯浪人 2020-11-29 15:52

I would like to construct a dataframe row-by-row in R. I\'ve done some searching, and all I came up with is the suggestion to create an empty list, keep a list index scalar,

相关标签:
8条回答
  • 2020-11-29 16:02

    Depending on the format of your new row, you might use tibble::add_row if your new row is simple and can specified in "value-pairs". Or you could use dplyr::bind_rows, "an efficient implementation of the common pattern of do.call(rbind, dfs)".

    0 讨论(0)
  • 2020-11-29 16:08

    The reason I like Rcpp so much is that I don't always get how R Core thinks, and with Rcpp, more often than not, I don't have to.

    Speaking philosophically, you're in a state of sin with regards to the functional paradigm, which tries to ensure that every value appears independent of every other value; changing one value should never cause a visible change in another value, the way you get with pointers sharing representation in C.

    The problems arise when functional programming signals the small craft to move out of the way, and the small craft replies "I'm a lighthouse". Making a long series of small changes to a large object which you want to process on in the meantime puts you square into lighthouse territory.

    In the C++ STL, push_back() is a way of life. It doesn't try to be functional, but it does try to accommodate common programming idioms efficiently.

    With some cleverness behind the scenes, you can sometimes arrange to have one foot in each world. Snapshot based file systems are a good example (which evolved from concepts such as union mounts, which also ply both sides).

    If R Core wanted to do this, underlying vector storage could function like a union mount. One reference to the vector storage might be valid for subscripts 1:N, while another reference to the same storage is valid for subscripts 1:(N+1). There could be reserved storage not yet validly referenced by anything but convenient for a quick push_back(). You don't violate the functional concept when appending outside the range that any existing reference considers valid.

    Eventually appending rows incrementally, you run out of reserved storage. You'll need to create new copies of everything, with the storage multiplied by some increment. The STL implementations I've use tend to multiply storage by 2 when extending allocation. I thought I read in R Internals that there is a memory structure where the storage increments by 20%. Either way, growth operations occur with logarithmic frequency relative to the total number of elements appended. On an amortized basis, this is usually acceptable.

    As tricks behind the scenes go, I've seen worse. Every time you push_back() a new row onto the dataframe, a top level index structure would need to be copied. The new row could append onto shared representation without impacting any old functional values. I don't even think it would complicate the garbage collector much; since I'm not proposing push_front() all references are prefix references to the front of the allocated vector storage.

    0 讨论(0)
  • 2020-11-29 16:09

    One can add rows to NULL:

    df<-NULL;
    while(...){
      #Some code that generates new row
      rbind(df,row)->df
    }
    

    for instance

    df<-NULL
    for(e in 1:10) rbind(df,data.frame(x=e,square=e^2,even=factor(e%%2==0)))->df
    print(df)
    
    0 讨论(0)
  • 2020-11-29 16:09

    This is a silly example of how to use do.call(rbind,) on the output of Map() [which is similar to lapply()]

    > DF <- do.call(rbind,Map(function(x) data.frame(a=x,b=x+1),x=1:3))
    > DF
      x y
    1 1 2
    2 2 3
    3 3 4
    > class(DF)
    [1] "data.frame"
    

    I use this construct quite often.

    0 讨论(0)
  • 2020-11-29 16:12

    Dirk Eddelbuettel's answer is the best; here I just note that you can get away with not pre-specifying the dataframe dimensions or data types, which is sometimes useful if you have multiple data types and lots of columns:

    row1<-list("a",1,FALSE) #use 'list', not 'c' or 'cbind'!
    row2<-list("b",2,TRUE)  
    
    df<-data.frame(row1,stringsAsFactors = F) #first row
    df<-rbind(df,row2) #now this works as you'd expect.
    
    0 讨论(0)
  • 2020-11-29 16:19

    If you have vectors destined to become rows, concatenate them using c(), pass them to a matrix row-by-row, and convert that matrix to a dataframe.

    For example, rows

    dummydata1=c(2002,10,1,12.00,101,426340.0,4411238.0,3598.0,0.92,57.77,4.80,238.29,-9.9)
    dummydata2=c(2002,10,2,12.00,101,426340.0,4411238.0,3598.0,-3.02,78.77,-9999.00,-99.0,-9.9)
    dummydata3=c(2002,10,8,12.00,101,426340.0,4411238.0,3598.0,-5.02,88.77,-9999.00,-99.0,-9.9)
    

    can be converted to a data frame thus:

    dummyset=c(dummydata1,dummydata2,dummydata3)
    col.len=length(dummydata1)
    dummytable=data.frame(matrix(data=dummyset,ncol=col.len,byrow=TRUE))
    

    Admittedly, I see 2 major limitations: (1) this only works with single-mode data, and (2) you must know your final # columns for this to work (i.e., I'm assuming that you're not working with a ragged array whose greatest row length is unknown a priori).

    This solution seems simple, but from my experience with type conversions in R, I'm sure it creates new challenges down-the-line. Can anyone comment on this?

    0 讨论(0)
提交回复
热议问题