问题
I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:
Suppose there is a table in a postgres database with one field (run this in PostgreSQL):
CREATE DATABASE mydb;
CREATE TABLE test_table
(
"DateTime" timestamp without time zone NOT NULL,
CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
OIDS=FALSE
);
ALTER TABLE test_table
OWNER TO postgres;
And let’s say there are a few hundred records. I will populate them in R. Here is the code:
library(RPostgreSQL)
# Let's feed the table with some sequence of date/time values
date_values <- as.chron(seq(10000, 10500, 1/24))
format.chron <- function(z) {
sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00",
as.numeric(as.character(years(z))),
months(z),
as.numeric(as.character(days(z))),
as.numeric(as.character(hours(z))),
as.numeric(as.character(minutes(z))))
}
.generateInsertQuery <- function(date_values, field_name, table_name) {
insert_val <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
qry <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
qry
}
drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)
If I try to get the values, the time component gets stripped out of the resulting data
res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]
The class of the result, however, is POSIXct
class(res[,1])
If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:
rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i] <- fetch(rs,1)
res_list
As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?
Thanks in advance!
回答1:
First off, the RPostgreSQL project has a mailing list; I suggest you post there.
PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.
回答2:
RPostgreSQL's dbWriteTable
with any posixct field will create database field of type timestamp with timezone
always with tz +00
no matter what posixct timezone it will be. I believe more precise would be to create timestamp without timezone
instead.
Best solution for both dbReadTable
and dbWriteTable
is to use Sys.setenv(TZ = "UTC")
.
In my opinion it is too deep dependency because many other processes in R session may require proper timezone setting.
Much more specific and not to deep dependent is to define own dbReadTable
and dbWriteTable
which wraps the DBI
version with appropriate preprocess/postprocess of posixct types. But still it is not an option if you are developing DBI-compliant code/package (not only postgres related).
It would be great to have RPostgreSQL migrated to github for easier contribution.
来源:https://stackoverflow.com/questions/17754635/is-there-a-specific-way-to-handle-timestamp-columns-in-r-when-pulling-data-using