sqldf

Find nearest matches for each row and sum based on a condition

倾然丶 夕夏残阳落幕 提交于 2019-12-20 23:59:08
问题 Consider the following data.table of events: library(data.table) breaks <- data.table(id = 1:8, Channel = c("NP1", "NP1", "NP2", "NP2", "NP3", "NP3", "AT4", "AT4"), Time = c(1000, 1100, 975, 1075, 1010, 1080, 1000, 1050), Day = c(1, 1, 1, 1, 1, 1, 1, 1), ZA = c(15, 12, 4, 2, 1, 2, 23, 18), stringsAsFactors = F) breaks id Channel Time Day ZA 1: 1 NP1 1000 1 15 2: 2 NP1 1100 1 12 3: 3 NP2 975 1 4 4: 4 NP2 1075 1 2 5: 5 NP3 1010 1 1 6: 6 NP3 1080 1 2 7: 7 AT4 1000 1 23 8: 8 AT4 1050 1 18 For

how to loop the dataframe using sqldf?

不问归期 提交于 2019-12-20 04:38:19
问题 First code: sample data: vector1 <- data.frame("name"="a","age"=10,"gender"="m") vector2 <- data.frame("name"="b","age"=33,"gender"="m") vector3 <- data.frame("name"="b","age"=58,"gender"="f") list <- list(vector1,vector2,vector3) sql <- list() for(i in 1:length(list)){ print(list[[1]]) # access dataframe sql[[i]]<- sqldf(paste0("select name,gender,count(name) from ",list[[i]]," group by gender ")) } How to loop the data frame correctly using sqldf function? I have tried list[[1]] or list[1]

Notation issues with read.csv.sql in r

为君一笑 提交于 2019-12-20 04:37:42
问题 I am using read.csv.sql to conditionally read in data ( my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in ). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set. Here is a small data set so my problem can be reproduced: write.csv(iris, "iris.csv", row.names = F) I am

NA values using sqldf

旧时模样 提交于 2019-12-20 02:02:07
问题 If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5. # prepare data and write to file write.table(data.frame(col1 = c(NA, NA, 3:10)), "my.na.txt", row.names = FALSE) mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5 my.na <- read.csv.sql("my.na.txt", sep = " ", sql = "SELECT AVG(col1) FROM file") # 5.2 # this is identical to sum(3:10)/10 unlink("my.na.txt") # remove file Which leads me to believe that sql(df) treats NA

Pass R variable to a sql statement

雨燕双飞 提交于 2019-12-19 11:36:39
问题 Is there any way to pass a defined variable in R to the SQL statement within the sqldf package? i have to run the code below and I passed the 'v' variable to sql select statement as '$v' for (i in 1:50){ v <- i+ 450 temp <- sqldf("select count(V1) from file_new where V1='$v' ") } Although it runs, it returns wrong result. [The result should be 1000 but this code returns 0]. Hence, I think it doesn't pass the variable value. 回答1: If v is an integer then you don't want to enclose the $v with

Cumulative sum by group in sqldf?

烈酒焚心 提交于 2019-12-18 07:02:35
问题 I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf , but can't seem to find an equivalent for cumsum . That is: sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T") doesn't work. Is this even possible with sqldf ? I tried doBy , but that doesn't all cumsum either. 回答1: Set up some test data: DF <- data.frame(t = 1:4, p = rep(1:3, each

Summarize with conditions in dplyr

老子叫甜甜 提交于 2019-12-17 03:59:17
问题 I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo 23 5 3 bar 54 6 5 bar 202 What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like: require(magrittr) require(dplyr) df1 <- df %>% group_by(ID) %>% summarize(sumB = sum(B)) df2 <- df %>% filter(A ==

Summarize with conditions in dplyr

Deadly 提交于 2019-12-17 03:59:02
问题 I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo 23 5 3 bar 54 6 5 bar 202 What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like: require(magrittr) require(dplyr) df1 <- df %>% group_by(ID) %>% summarize(sumB = sum(B)) df2 <- df %>% filter(A ==

r column values in sql where statement

ぃ、小莉子 提交于 2019-12-13 18:47:23
问题 I have a dataset and I am trying to pass the contents of a specific column into the SQL where statement. For example, assuming iris is my dataset data(iris) head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 5.1 3.5 1.4 0.2 setosa 4.9 3.0 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5.0 3.6 1.4 0.2 setosa 5.4 3.9 1.7 0.4 setosa I want to pass the contents of column Species { setosa, setosa, setosa.....setosa} to my sql query where statement sqlQuery(abcd,

sqldf package in R, querying a data frame

馋奶兔 提交于 2019-12-13 12:29:39
问题 I'm trying to rewrite some code using the sqldf library in R, which should allow me to run SQL queries on data frames, but I am having an issue in that whenever I try to run a query, R seems like it tries to query the actual real MySQL db con that I use and look for a table by the name of a the data frame that I am trying to search by. When I run this: sqldf("SELECT COUNT(*) from work.class_scores") I get: Error in mysqlNewConnection(drv, ...) : RS-DBI driver: (Failed to connect to database: