Extend an irregular sequence and add zeros to missing values

前端 未结 9 2206
深忆病人
深忆病人 2020-12-19 08:36

I have a data frame with a sequence in \'col1\' and values in \'col2\':

col1 col2
2     0.02
5     0.12
9     0.91
13    1.13

I want to exp

相关标签:
9条回答
  • 2020-12-19 09:11

    Just for completeness, a self binary join using data.table (you will get NAs instead of zeroes, but that could be easily changed if needed)

    library(data.table)
    setDT(df)[.(seq(max(col1))), on = .(col1)]
    #     col1 col2
    #  1:    1   NA
    #  2:    2 0.02
    #  3:    3   NA
    #  4:    4   NA
    #  5:    5 0.12
    #  6:    6   NA
    #  7:    7   NA
    #  8:    8   NA
    #  9:    9 0.91
    # 10:   10   NA
    # 11:   11   NA
    # 12:   12   NA
    # 13:   13 1.13
    
    0 讨论(0)
  • 2020-12-19 09:11

    Another way would be the following. Your data is called mydf here. You create a data frame with a column including 1 to the max value of col1. Then, you use assign the values of col2 in mydf to a new column called col2 in foo. You use the numbers in col1 in mydf as index when you do this process. By this time, you have NA in col2 in foo. You want to change NA to 0. So the final step is to do this. You look for NA's position in col2 in foo using is.na() and assign zeros to the positions.

    foo <- data.frame(col1 = 1:max(mydf$col1))
    foo$col2[mydf$col1] <- mydf$col2
    foo$col2[is.na(foo$col2)] <- 0
    

    Taking lmo's idea into an account, you can create a data frame with 0 first and avoid the 3rd step.

    foo <- data.frame(col1 = 1:max(mydf$col1), col2 = 0)
    foo$col2[mydf$col1] <- mydf$col2
    
    
    #   col1 col2
    #1     1 0.00
    #2     2 0.02
    #3     3 0.00
    #4     4 0.00
    #5     5 0.12
    #6     6 0.00
    #7     7 0.00
    #8     8 0.00
    #9     9 0.91
    #10   10 0.00
    #11   11 0.00
    #12   12 0.00
    #13   13 1.13
    

    DATA

    mydf <- structure(list(col1 = c(2L, 5L, 9L, 13L), col2 = c(0.02, 0.12, 
    0.91, 1.13)), .Names = c("col1", "col2"), class = "data.frame", row.names = c(NA, 
    -4L))
    
    0 讨论(0)
  • 2020-12-19 09:19

    Another way would be:

    for (i in 1:max(test$col1)) {
      if(!(i %in% test$col1)) (test <- rbind(test, c(i, 0)))
    }
    test <- test[order(test$col1),]
    

    Axeman's answer is really sweet, though.

    Edit: Data used --

    test <- structure(list(col1 = c(2, 5, 9, 13), col2 = c(0.02, 0.12, 0.91, 
    1.13)), .Names = c("col1", "col2"), row.names = c(NA, -4L), class = "data.frame")
    

    DISCLAIMER: This should really not be used for big datasets. I tried it with 1k rows and it was done in a heartbeat, but my second test with 100k rows is running for minutes now, which really emphasizes Axeman's concerns in his comment.

    0 讨论(0)
提交回复
热议问题