sqldf | 易学教程

Find nearest matches for each row and sum based on a condition

阅读更多关于 Find nearest matches for each row and sum based on a condition

问题 Consider the following data.table of events: library(data.table) breaks <- data.table(id = 1:8, Channel = c("NP1", "NP1", "NP2", "NP2", "NP3", "NP3", "AT4", "AT4"), Time = c(1000, 1100, 975, 1075, 1010, 1080, 1000, 1050), Day = c(1, 1, 1, 1, 1, 1, 1, 1), ZA = c(15, 12, 4, 2, 1, 2, 23, 18), stringsAsFactors = F) breaks id Channel Time Day ZA 1: 1 NP1 1000 1 15 2: 2 NP1 1100 1 12 3: 3 NP2 975 1 4 4: 4 NP2 1075 1 2 5: 5 NP3 1010 1 1 6: 6 NP3 1080 1 2 7: 7 AT4 1000 1 23 8: 8 AT4 1050 1 18 For

how to loop the dataframe using sqldf?

阅读更多关于 how to loop the dataframe using sqldf?

问题 First code: sample data: vector1 <- data.frame("name"="a","age"=10,"gender"="m") vector2 <- data.frame("name"="b","age"=33,"gender"="m") vector3 <- data.frame("name"="b","age"=58,"gender"="f") list <- list(vector1,vector2,vector3) sql <- list() for(i in 1:length(list)){ print(list[[1]]) # access dataframe sql[[i]]<- sqldf(paste0("select name,gender,count(name) from ",list[[i]]," group by gender ")) } How to loop the data frame correctly using sqldf function? I have tried list[[1]] or list[1]

Notation issues with read.csv.sql in r

阅读更多关于 Notation issues with read.csv.sql in r

问题 I am using read.csv.sql to conditionally read in data ( my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in ). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set. Here is a small data set so my problem can be reproduced: write.csv(iris, "iris.csv", row.names = F) I am

NA values using sqldf

阅读更多关于 NA values using sqldf

问题 If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5. # prepare data and write to file write.table(data.frame(col1 = c(NA, NA, 3:10)), "my.na.txt", row.names = FALSE) mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5 my.na <- read.csv.sql("my.na.txt", sep = " ", sql = "SELECT AVG(col1) FROM file") # 5.2 # this is identical to sum(3:10)/10 unlink("my.na.txt") # remove file Which leads me to believe that sql(df) treats NA

Pass R variable to a sql statement

阅读更多关于 Pass R variable to a sql statement

问题 Is there any way to pass a defined variable in R to the SQL statement within the sqldf package? i have to run the code below and I passed the 'v' variable to sql select statement as '$v' for (i in 1:50){ v <- i+ 450 temp <- sqldf("select count(V1) from file_new where V1='$v' ") } Although it runs, it returns wrong result. [The result should be 1000 but this code returns 0]. Hence, I think it doesn't pass the variable value. 回答1: If v is an integer then you don't want to enclose the $v with

Cumulative sum by group in sqldf?

阅读更多关于 Cumulative sum by group in sqldf?

问题 I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf , but can't seem to find an equivalent for cumsum . That is: sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T") doesn't work. Is this even possible with sqldf ? I tried doBy , but that doesn't all cumsum either. 回答1: Set up some test data: DF <- data.frame(t = 1:4, p = rep(1:3, each

Summarize with conditions in dplyr

阅读更多关于 Summarize with conditions in dplyr

问题 I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo 23 5 3 bar 54 6 5 bar 202 What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like: require(magrittr) require(dplyr) df1 <- df %>% group_by(ID) %>% summarize(sumB = sum(B)) df2 <- df %>% filter(A ==

Summarize with conditions in dplyr

阅读更多关于 Summarize with conditions in dplyr

r column values in sql where statement

阅读更多关于 r column values in sql where statement

问题 I have a dataset and I am trying to pass the contents of a specific column into the SQL where statement. For example, assuming iris is my dataset data(iris) head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 5.1 3.5 1.4 0.2 setosa 4.9 3.0 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5.0 3.6 1.4 0.2 setosa 5.4 3.9 1.7 0.4 setosa I want to pass the contents of column Species { setosa, setosa, setosa.....setosa} to my sql query where statement sqlQuery(abcd,

sqldf package in R, querying a data frame

阅读更多关于 sqldf package in R, querying a data frame

问题 I'm trying to rewrite some code using the sqldf library in R, which should allow me to run SQL queries on data frames, but I am having an issue in that whenever I try to run a query, R seems like it tries to query the actual real MySQL db con that I use and look for a table by the name of a the data frame that I am trying to search by. When I run this: sqldf("SELECT COUNT(*) from work.class_scores") I get: Error in mysqlNewConnection(drv, ...) : RS-DBI driver: (Failed to connect to database: