sqldf

Select max amount spent in single order

做~自己de王妃 提交于 2019-12-10 19:49:38
问题 I am very new to R and sqldf and can't seem to solve a basic problem. I have a file with transactions where each row represents a product purchased. The file looks like this: customer_id,order_number,order_date, amount, product_name 1, 202, 21/04/2015, 58, "xlfd" 1, 275, 16//08/2015, 74, "ghb" 1, 275, 16//08/2015, 36, "fjk" 2, 987, 12/03/2015, 27, "xlgm" 3, 376, 16/05/2015, 98, "fgt" 3, 368, 30/07/2015, 46, "ade" I need to find the maximum amount spent in a single transaction (same order

Select specified rows when importing CSV

▼魔方 西西 提交于 2019-12-10 16:57:10
问题 I have a large CSV file and I only want to import select certain rows if it. First I create the indices of the rows that will be imported then I wish to pass the names of these rows to sqldf and return the full records for specified rows. #create the random rows ids that will be sampled library(dplyr) #range for the values index<-c(1:20) index<-as.data.frame(as.matrix(index)) #number of values to be returned number<-5 ids<-sample_n(index,number) #sample the data library(sqldf) #filepath f<

Error in R Using SQLDF: too many SQL variables

五迷三道 提交于 2019-12-10 16:48:59
问题 I have a large dataset with nearly 2000 variables in r. I then use sqldf to write a few case statements to create new columns on the original dataset. However I get the following error: Error in rsqlite_send_query(conn@ptr, statement) : too many SQL variables I rebooted my laptop today and previously this error never occured. Any help is appreciated. 回答1: I hit the same problem. I just limited the number of columns # here creating data with alot of columns a<- mtcars for( i in 1:1000 ){ b <-

Handling quotation marks in sqldf

半城伤御伤魂 提交于 2019-12-10 11:59:30
问题 I want to use sqldf and be able to write SQL statements exactly as they would be written in the sql command terminal. For instance, here is a query from the manual: Gavg <- sqldf("select g, avg(v) as avg_v from DF group by g") If I were working with a separate SQL file, the query would be written: select g, avg(v) as avg_v from "DF" group by g However, if I were to write this as: Gavg <- sqldf(" select g, avg(v) as avg_v from "DF" group by g ") I would like to be able to copy/paste snippets

Make new feature using 2 tables

隐身守侯 提交于 2019-12-10 11:17:45
问题 table1 <- data.frame(user_id=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2), product_id = c(14, 24, 38, 40, 66, 2, 19, 30, 71, 98, 7, 16), first_order = c(1, 2, 1, 4, 5, 3, 2, 4, 2, 4, 2, 3), last_order = c(4, 7, 5, 8, 8, 3, 4, 7, 5, 9, 4, 5)) table2 <- data.frame(user_id=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), order_number=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6), days_cumsum = c(0, 7, 15, 26, 34, 43, 53, 59, 66, 74, 82, 91, 5, 11, 17, 24, 29, 35)) I want to add new

Convert an integer value to datetime in sqldf

余生长醉 提交于 2019-12-10 09:56:38
问题 I am using sqldf library to return a data frame with distinct values and also only the max of the date column. The data frame looks like this +------+----------+--------+-----------------+ | NAME | val1 | val2 | DATE | +------+----------+--------+-----------------+ | A | 23.7228 | 0.5829 | 11/19/2014 8:17 | | A | 23.7228 | 0.5829 | 11/12/2014 8:16 | +------+----------+--------+-----------------+ When I try to run the below code to get the distinct values with max date df <- sqldf("SELECT

sqldf: query data by range of dates

六月ゝ 毕业季﹏ 提交于 2019-12-09 03:32:20
问题 I am reading from a huge text file that has '%d/%m/%Y' date format. I want to use read.csv.sql of sqldf to read and filter the data by date at the same time. This is to save memory usage and run time by skipping many dates that I am not interested in. I know how to do this with the help of dplyr and lubridate , but I just want to try with sqldf for the aforementioned reason. Even though I am quite familiar with SQL syntax, it still gets me most of the time, no exception with sqldf . Running

Failed to connect the database when using sqldf in r

醉酒当歌 提交于 2019-12-08 16:14:16
问题 I loaded a csv file to my R, and when I Tried to use sqldf to select some column, it always went to Error in .local(drv, ...) : Failed to connect to database: Error: Access denied for user 'User'@'localhost' (using password: NO) Error in !dbPreExists : invalid argument type I don't know how to fix it. Here is my script: library("RMySQL") library(sqldf) acs<-read.csv("getdata_data_ss06pid.csv",head = T) sqldf("select pwgtp1 from acs where AGEP < 50") 回答1: It doesn't seem like you need to load

Multiple cumulative sums [duplicate]

て烟熏妆下的殇ゞ 提交于 2019-12-08 10:10:18
问题 This question already has answers here : How to get the cumulative sum by group in R? (2 answers) Closed 3 years ago . Hopefully the title is explicit enough. I have a table looking like that : classes id value a 1 10 a 2 15 a 3 12 b 1 5 b 2 9 b 3 7 c 1 6 c 2 14 c 3 6 and here is what I would like : classes id value cumsum a 1 10 10 a 2 15 25 a 3 12 37 b 1 5 5 b 2 9 14 b 3 7 21 c 1 6 6 c 2 14 20 c 3 6 26 I've seen this solution, and I've already applied it successfully to cases where I don't

Join two datasets based on an inequality condition

无人久伴 提交于 2019-12-08 09:00:28
问题 I have used the call below to "join" my datasets based on an inequality condition: library(sqldf) sqldf("select * from dataset1 a, dataset2 b a.col1 <= b.col2") However, is there a way I can do this without sqldf ? So far, I can only see merge functions that are based on simple joins on a particular common column. Thanks! 回答1: Non-equi (or conditional) joins were recently implemented in data.table, and available in the current development version, v1.9.7. See installation instructions here.