问题
First code:
sample data:
vector1 <- data.frame("name"="a","age"=10,"gender"="m")
vector2 <- data.frame("name"="b","age"=33,"gender"="m")
vector3 <- data.frame("name"="b","age"=58,"gender"="f")
list <- list(vector1,vector2,vector3)
sql <- list()
for(i in 1:length(list)){
print(list[[1]]) # access dataframe
sql[[i]]<-
sqldf(paste0("select name,gender,count(name) from ",list[[i]]," group by gender "))
}
How to loop the data frame correctly using sqldf function? I have tried list[[1]] or list[1] in the sqldf function to do the test but it will return no such table or syntax error. In the loop function, I can access the data frame. Is it possible to use this format?
print(str(list))
List of 3
$ :'data.frame': 1 obs. of 3 variables:
..$ name : Factor w/ 1 level "a": 1
..$ age : num 10
..$ gender: Factor w/ 1 level "m": 1
$ :'data.frame': 1 obs. of 3 variables:
..$ name : Factor w/ 1 level "b": 1
..$ age : num 33
..$ gender: Factor w/ 1 level "m": 1
$ :'data.frame': 1 obs. of 3 variables:
..$ name : Factor w/ 1 level "b": 1
..$ age : num 58
..$ gender: Factor w/ 1 level "f": 1
NULL
Second:
This code is my expectation.
f<- lapply(list, function(dataframe) {
sql <-
sqldf("select name,gender,count(name) from dataframe group by gender ")
})
print(f)
This is the output.
> print(f)
[[1]]
name gender count(name)
1 a m 1
[[2]]
name gender count(name)
1 b m 1
[[3]]
name gender count(name)
1 b f 1
Is it possible to use the first code to access the list? How to fix it when I want to use paste function to access each data frame in a list.
回答1:
sqldf::sqldf
refers to the object existing in the environment. So, just create DF <- list[[i]]
and use this name in query.
for(i in 1:length(list)){
print(list[[1]]) # access dataframe
DF <- list[[i]]
sql[[i]]<- sqldf("select name,gender,count(name) from DF group by gender ")
}
print(sql)
# [[1]]
# name gender count(name)
# 1 a m 1
#
# [[2]]
# name gender count(name)
# 1 b m 1
#
# [[3]]
# name gender count(name)
# 1 b f 1
回答2:
You asked about the use of lapply, which will do away with the need to use a for-loop to process your list of dataframes. Here is a solution using a simple stand-alone function to apply the sqldf statement to a given dataframe, and lapply to apply it to your list of dataframes without explicit looping:
namecount <- function(df){
sqldf("select name, gender, count(name) from df group by gender")
}
sql = lapply(list, namecount)
Output:
> sql
[[1]]
name gender count(name)
1 a m 1
[[2]]
name gender count(name)
1 b m 1
[[3]]
name gender count(name)
1 b f 1
回答3:
The OP has asked for help in using sqldf()
to aggregate data.frames which are stored in a list. If I understood correctly, the OP wants to count the number of male and female individuals within each data.frame.
The OP has asked two related questions ("using lapply function and list in r " and "add missed value based on the value of the column in r ") where he is also seeking help in handling a list of data.frames.
As explained in my answers to above questions it is almost always better to combine data.frames with identical structure in one large data.table:
library(data.table)
rbindlist(list, idcol = "df")
df name age gender 1: 1 a 10 m 2: 2 b 33 m 3: 3 b 58 f
Note that the additional df
column identifies the origin of each row.
Now, we can easily count the number of rows by gender
for each df
by
rbindlist(list, idcol = "df")[, .N, by = .(df, gender)]
df gender N 1: 1 m 1 2: 2 m 1 3: 3 f 1
.N
is a special symbol in data.table
syntax which counts the number of rows within each group. The name
column is irrelevant when aggregating like this and has been neglected therefore.
来源:https://stackoverflow.com/questions/48022642/how-to-loop-the-dataframe-using-sqldf