how to loop the dataframe using sqldf?

不问归期 提交于 2019-12-20 04:38:19

问题


First code:

sample data:

vector1 <- data.frame("name"="a","age"=10,"gender"="m")
vector2 <-  data.frame("name"="b","age"=33,"gender"="m")
vector3 <-  data.frame("name"="b","age"=58,"gender"="f")
list <- list(vector1,vector2,vector3)

sql <- list()
for(i in 1:length(list)){
   print(list[[1]]) # access dataframe
   sql[[i]]<-
    sqldf(paste0("select name,gender,count(name) from ",list[[i]]," group by gender "))
}

How to loop the data frame correctly using sqldf function? I have tried list[[1]] or list[1] in the sqldf function to do the test but it will return no such table or syntax error. In the loop function, I can access the data frame. Is it possible to use this format?

print(str(list))
List of 3
 $ :'data.frame':   1 obs. of  3 variables:
  ..$ name  : Factor w/ 1 level "a": 1
  ..$ age   : num 10
  ..$ gender: Factor w/ 1 level "m": 1
 $ :'data.frame':   1 obs. of  3 variables:
  ..$ name  : Factor w/ 1 level "b": 1
  ..$ age   : num 33
  ..$ gender: Factor w/ 1 level "m": 1
 $ :'data.frame':   1 obs. of  3 variables:
  ..$ name  : Factor w/ 1 level "b": 1
  ..$ age   : num 58
  ..$ gender: Factor w/ 1 level "f": 1
NULL

Second:

This code is my expectation.

f<- lapply(list, function(dataframe) {
      sql <-
        sqldf("select name,gender,count(name) from dataframe group by gender ")
    })
    print(f)

This is the output.

> print(f)
[[1]]
  name gender count(name)
1    a      m           1

[[2]]
  name gender count(name)
1    b      m           1

[[3]]
  name gender count(name)
1    b      f           1

Is it possible to use the first code to access the list? How to fix it when I want to use paste function to access each data frame in a list.


回答1:


sqldf::sqldf refers to the object existing in the environment. So, just create DF <- list[[i]] and use this name in query.

for(i in 1:length(list)){
  print(list[[1]]) # access dataframe
  DF <- list[[i]]
 sql[[i]]<- sqldf("select name,gender,count(name) from DF group by gender ")
}
print(sql)
# [[1]]
#   name gender count(name)
# 1    a      m           1
# 
# [[2]]
#   name gender count(name)
# 1    b      m           1
# 
# [[3]]
#   name gender count(name)
# 1    b      f           1



回答2:


You asked about the use of lapply, which will do away with the need to use a for-loop to process your list of dataframes. Here is a solution using a simple stand-alone function to apply the sqldf statement to a given dataframe, and lapply to apply it to your list of dataframes without explicit looping:

namecount <- function(df){
  sqldf("select name, gender, count(name) from df group by gender")
}

sql = lapply(list, namecount)

Output:

> sql
[[1]]
  name gender count(name)
1    a      m           1

[[2]]
  name gender count(name)
1    b      m           1

[[3]]
  name gender count(name)
1    b      f           1



回答3:


The OP has asked for help in using sqldf() to aggregate data.frames which are stored in a list. If I understood correctly, the OP wants to count the number of male and female individuals within each data.frame.

The OP has asked two related questions ("using lapply function and list in r " and "add missed value based on the value of the column in r ") where he is also seeking help in handling a list of data.frames.

As explained in my answers to above questions it is almost always better to combine data.frames with identical structure in one large data.table:

library(data.table)
rbindlist(list, idcol = "df")
   df name age gender
1:  1    a  10      m
2:  2    b  33      m
3:  3    b  58      f

Note that the additional df column identifies the origin of each row.

Now, we can easily count the number of rows by gender for each df by

rbindlist(list, idcol = "df")[, .N, by = .(df, gender)]
   df gender N
1:  1      m 1
2:  2      m 1
3:  3      f 1

.N is a special symbol in data.table syntax which counts the number of rows within each group. The name column is irrelevant when aggregating like this and has been neglected therefore.



来源:https://stackoverflow.com/questions/48022642/how-to-loop-the-dataframe-using-sqldf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!