using lapply function and list in r

后端 未结 2 1640
别跟我提以往
别跟我提以往 2021-01-25 23:33
d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), co         


        
相关标签:
2条回答
  • 2021-01-26 00:07

    You need to iterate over data and counts simultaneously. In tidyverse I would recommend using purrr::map2(), but in base R you can simply do:'

    table<- mapply(function(data, count) {
        sql <-
          #sqldf(
          paste0(
            "select *,count(col_one) from data where col_one = ",
            count," group by col_one"
          )
        #)
        print(sql)
      }, my.list, 1:3
      )
    [1] "select *,count(col_one) from data where col_one = 1 group by col_one"
    [1] "select *,count(col_one) from data where col_one = 2 group by col_one"
    [1] "select *,count(col_one) from data where col_one = 3 group by col_one"
    
    0 讨论(0)
  • 2021-01-26 00:20

    If I understood correctly, the OP wants to create contingency tables for col_one for each of the data.frames in my.list, i.e., he wants to know how many times each of the values 1, 2, or 3 appear in col_one in each data.frame.

    As explained in my answer to another question of the OP and as suggested by G. Grothendieck, it is almost always better to combine data.frames with identical structure in a large data.table than to keep them separate in a list. BTW, there is also a third question ("how to loop the dataframe using sqldf?") by the OP asking for help with a list of data.frames.

    To combine the data.frames in a large data.table, the rbindlist() function is used. Note that the added id column df identifies the originating data.frame of each row.

    library(data.table)
    rbindlist(my.list, idcol = "df")
    
       df col_one col_two
    1:  1       1       4
    2:  1       2       5
    3:  1       3       6
    4:  2       1       6
    5:  2       1       5
    6:  2       1       4
    7:  3       7       8
    8:  3       1       5
    9:  3       1       4
    

    Now we can easily compute the aggregates:

    rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]
    
       df col_one col_two count_col_one
    1:  1       1       4             1
    2:  1       2       5             1
    3:  1       3       6             1
    4:  2       1       6             3
    5:  2       1       5             3
    6:  2       1       4             3
    7:  3       7       8             1
    8:  3       1       5             2
    9:  3       1       4             2
    

    This data.table statement counts the occurrences of each individual value in col_one for each df by using the special symbol .N and by grouping by df and col_one.

    In the question, the OP has only asked to count occurrences of 1, 2, or 3 in col_one. If this really is intended, the value of 7 needs to be removed. This can be accomplished by filtering the result:

    rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
      col_one %in% 1:3]
    
    0 讨论(0)
提交回复
热议问题