sqldf | 易学教程

Select max amount spent in single order

阅读更多关于 Select max amount spent in single order

问题 I am very new to R and sqldf and can't seem to solve a basic problem. I have a file with transactions where each row represents a product purchased. The file looks like this: customer_id,order_number,order_date, amount, product_name 1, 202, 21/04/2015, 58, "xlfd" 1, 275, 16//08/2015, 74, "ghb" 1, 275, 16//08/2015, 36, "fjk" 2, 987, 12/03/2015, 27, "xlgm" 3, 376, 16/05/2015, 98, "fgt" 3, 368, 30/07/2015, 46, "ade" I need to find the maximum amount spent in a single transaction (same order

Select specified rows when importing CSV

阅读更多关于 Select specified rows when importing CSV

问题 I have a large CSV file and I only want to import select certain rows if it. First I create the indices of the rows that will be imported then I wish to pass the names of these rows to sqldf and return the full records for specified rows. #create the random rows ids that will be sampled library(dplyr) #range for the values index<-c(1:20) index<-as.data.frame(as.matrix(index)) #number of values to be returned number<-5 ids<-sample_n(index,number) #sample the data library(sqldf) #filepath f<

Error in R Using SQLDF: too many SQL variables

阅读更多关于 Error in R Using SQLDF: too many SQL variables

问题 I have a large dataset with nearly 2000 variables in r. I then use sqldf to write a few case statements to create new columns on the original dataset. However I get the following error: Error in rsqlite_send_query(conn@ptr, statement) : too many SQL variables I rebooted my laptop today and previously this error never occured. Any help is appreciated. 回答1: I hit the same problem. I just limited the number of columns # here creating data with alot of columns a<- mtcars for( i in 1:1000 ){ b <-

Handling quotation marks in sqldf

阅读更多关于 Handling quotation marks in sqldf

问题 I want to use sqldf and be able to write SQL statements exactly as they would be written in the sql command terminal. For instance, here is a query from the manual: Gavg <- sqldf("select g, avg(v) as avg_v from DF group by g") If I were working with a separate SQL file, the query would be written: select g, avg(v) as avg_v from "DF" group by g However, if I were to write this as: Gavg <- sqldf(" select g, avg(v) as avg_v from "DF" group by g ") I would like to be able to copy/paste snippets

Make new feature using 2 tables

阅读更多关于 Make new feature using 2 tables

问题 table1 <- data.frame(user_id=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2), product_id = c(14, 24, 38, 40, 66, 2, 19, 30, 71, 98, 7, 16), first_order = c(1, 2, 1, 4, 5, 3, 2, 4, 2, 4, 2, 3), last_order = c(4, 7, 5, 8, 8, 3, 4, 7, 5, 9, 4, 5)) table2 <- data.frame(user_id=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), order_number=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6), days_cumsum = c(0, 7, 15, 26, 34, 43, 53, 59, 66, 74, 82, 91, 5, 11, 17, 24, 29, 35)) I want to add new

Convert an integer value to datetime in sqldf

阅读更多关于 Convert an integer value to datetime in sqldf

问题 I am using sqldf library to return a data frame with distinct values and also only the max of the date column. The data frame looks like this +------+----------+--------+-----------------+ | NAME | val1 | val2 | DATE | +------+----------+--------+-----------------+ | A | 23.7228 | 0.5829 | 11/19/2014 8:17 | | A | 23.7228 | 0.5829 | 11/12/2014 8:16 | +------+----------+--------+-----------------+ When I try to run the below code to get the distinct values with max date df <- sqldf("SELECT

sqldf: query data by range of dates

阅读更多关于 sqldf: query data by range of dates

问题 I am reading from a huge text file that has '%d/%m/%Y' date format. I want to use read.csv.sql of sqldf to read and filter the data by date at the same time. This is to save memory usage and run time by skipping many dates that I am not interested in. I know how to do this with the help of dplyr and lubridate , but I just want to try with sqldf for the aforementioned reason. Even though I am quite familiar with SQL syntax, it still gets me most of the time, no exception with sqldf . Running

Failed to connect the database when using sqldf in r

阅读更多关于 Failed to connect the database when using sqldf in r

问题 I loaded a csv file to my R, and when I Tried to use sqldf to select some column, it always went to Error in .local(drv, ...) : Failed to connect to database: Error: Access denied for user 'User'@'localhost' (using password: NO) Error in !dbPreExists : invalid argument type I don't know how to fix it. Here is my script: library("RMySQL") library(sqldf) acs<-read.csv("getdata_data_ss06pid.csv",head = T) sqldf("select pwgtp1 from acs where AGEP < 50") 回答1: It doesn't seem like you need to load

Multiple cumulative sums [duplicate]

阅读更多关于 Multiple cumulative sums [duplicate]

问题 This question already has answers here : How to get the cumulative sum by group in R? (2 answers) Closed 3 years ago . Hopefully the title is explicit enough. I have a table looking like that : classes id value a 1 10 a 2 15 a 3 12 b 1 5 b 2 9 b 3 7 c 1 6 c 2 14 c 3 6 and here is what I would like : classes id value cumsum a 1 10 10 a 2 15 25 a 3 12 37 b 1 5 5 b 2 9 14 b 3 7 21 c 1 6 6 c 2 14 20 c 3 6 26 I've seen this solution, and I've already applied it successfully to cases where I don't

Join two datasets based on an inequality condition

阅读更多关于 Join two datasets based on an inequality condition

问题 I have used the call below to "join" my datasets based on an inequality condition: library(sqldf) sqldf("select * from dataset1 a, dataset2 b a.col1 <= b.col2") However, is there a way I can do this without sqldf ? So far, I can only see merge functions that are based on simple joins on a particular common column. Thanks! 回答1: Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.