问题
I'm trying to filter a dataframe based on a date range using sqldf, like the example code below. I have data like the example data below. The datedf dataframe that gets returned by sqldf has no records. There are records in the SHV dataframe in that date range, can anyone see what I'm doing wrong and let me know how to filter by date range in sqldf. Dates are always tricky in r for me.
Code:
datedf<-sqldf("select field1
,fieldDate
from SHV
where fieldDate between '2004-01-01' and '2005-01-01'
")
Data:
dput(SHV[1:50,c("field1","fieldDate")])
structure(list(field1 = c(1378L, 1653L, 1882L, 2400L,
2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L,
1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L,
2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L,
2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L,
1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L,
1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337,
12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207,
10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA,
16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899,
17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081,
10939, NA, 12934, 11176), class = "Date")), .Names = c("field1",
"fieldDate"), row.names = c(NA, 50L), class = "data.frame")
回答1:
In this sample of data you have no records in that date range:
SHV[SHV$fieldDate >= "2010-01-01" & SHV$fieldDate < "2011-01-01",]
field1 fieldDate
NA NA <NA>
NA.1 NA <NA>
NA.2 NA <NA>
NA.3 NA <NA>
NA.4 NA <NA>
NA.5 NA <NA>
NA.6 NA <NA>
NA.7 NA <NA>
NA.8 NA <NA>
NA.9 NA <NA>
NA.10 NA <NA>
NA.11 NA <NA>
NA.12 NA <NA>
回答2:
Per the sqldf()
documentation, one needs to format the dates into their numeric values in order to have them processed as dates. This can be accomplished with sprintf()
when generating the SQL query.
SHV <- structure(list(field1 = c(1378L, 1653L, 1882L, 2400L,
2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L,
1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L,
2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L,
2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L,
1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L,
1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337,
12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207,
10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA,
16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899,
17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081,
10939, NA, 12934, 11176), class = "Date")), .Names = c("field1",
"fieldDate"), row.names = c(NA, 50L), class = "data.frame")
library(sqldf)
sqlStmt <- paste("select field1, fieldDate from SHV",
"where fieldDate between ",
sprintf("%d and %d",as.Date('2004-01-01','%Y-%m-%d'),
as.Date('2005-01-01','%Y-%m-%d')))
datedf<-sqldf(sqlStmt)
datedf
> datedf
field1 fieldDate
1 1378 2004-05-13
2 2321 2004-08-16
3 2350 2004-08-04
4 1586 2004-07-15
5 1590 2004-12-03
6 1603 2004-08-04
7 1860 2004-07-26
>
The sprintf()
statement converts the dates to a numeric value, which ensures that the between
operator in SQL works correctly.
> sqlStmt
[1] "select field1, fieldDate from SHV where fieldDate between 12418 and 12784"
>
回答3:
According to this article, the date field should be converted to a character before executing sqldf.
Before passing any dates to SQLdf we need to first convert them to strings. Otherwise, SQLdf will try to treat them as numbers–which will cause a lot of heartache.
...
Instead, we should convert the DateCreated column to a string instead of a date. Then, SQL will actually convert it from a string to a date.
Confused? Imagine me when I was trying to figure this out on my own.
So your code could be:
SHV$fieldDate <- as.character(SHV$fieldDate)
datedf <- sqldf("
SELECT
field1,
fieldDate
FROM SHV
WHERE fieldDate between '2004-01-01' and '2005-01-01'
--WHERE '2004-01-01' <= fieldDate --and fieldDate <= '2005-01-01'
ORDER BY fieldDate
")
# Both should equal 7. Verify that null rows are handled as desired.
nrow(datedf)
sum(as.Date('2004-01-01') <= SHV$fieldDate & SHV$fieldDate <= as.Date('2005-01-01'), na.rm=T)
I wish it had some more explanation about when it casts a date-looking variable into an actual date. If you're looking around more, this SO response by @g-grothendieck takes a different approach and equate the data types inside the sqldf query.
来源:https://stackoverflow.com/questions/47933142/filter-by-date-range-in-sqldf