d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), co
You need to iterate over data
and counts
simultaneously. In tidyverse
I would recommend using purrr::map2(), but in base R you can simply do:'
table<- mapply(function(data, count) {
sql <-
#sqldf(
paste0(
"select *,count(col_one) from data where col_one = ",
count," group by col_one"
)
#)
print(sql)
}, my.list, 1:3
)
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
If I understood correctly, the OP wants to create contingency tables for col_one
for each of the data.frames in my.list
, i.e., he wants to know how many times each of the values 1, 2, or 3 appear in col_one
in each data.frame.
As explained in my answer to another question of the OP and as suggested by G. Grothendieck, it is almost always better to combine data.frames with identical structure in a large data.table than to keep them separate in a list. BTW, there is also a third question ("how to loop the dataframe using sqldf?") by the OP asking for help with a list of data.frames.
To combine the data.frames in a large data.table, the rbindlist()
function is used. Note that the added id column df
identifies the originating data.frame of each row.
library(data.table)
rbindlist(my.list, idcol = "df")
df col_one col_two 1: 1 1 4 2: 1 2 5 3: 1 3 6 4: 2 1 6 5: 2 1 5 6: 2 1 4 7: 3 7 8 8: 3 1 5 9: 3 1 4
Now we can easily compute the aggregates:
rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]
df col_one col_two count_col_one 1: 1 1 4 1 2: 1 2 5 1 3: 1 3 6 1 4: 2 1 6 3 5: 2 1 5 3 6: 2 1 4 3 7: 3 7 8 1 8: 3 1 5 2 9: 3 1 4 2
This data.table
statement counts the occurrences of each individual value in col_one
for each df
by using the special symbol .N
and by grouping by df
and col_one
.
In the question, the OP has only asked to count occurrences of 1, 2, or 3 in col_one
. If this really is intended, the value of 7 needs to be removed. This can be accomplished by filtering the result:
rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
col_one %in% 1:3]