问题
This question arises because I wish to make a function for my convenience:
as.numeric_psql <- function(x) {
return(as.numeric(as.integer(x)))
}
to convert boolean values in a remote postgres table into numeric. The step to convert to integer is needed as:
There is no direct cast defined between numeric and boolean. You can use integer as middle-ground. (https://stackoverflow.com/a/19290671/2109289)
Of course this function works as expected locally:
copy_to(con_psql, cars, 'tmp_cars')
tmp_cars_sdf <-
tbl(con_psql, 'tmp_cars')
tmp_cars_sdf %>%
mutate(low_dist = dist < 5) %>%
mutate(low_dist = as.numeric(as.integer(low_dist)))
# # Source: lazy query [?? x 3]
# # Database: postgres 9.5.3
# speed dist low_dist
# <dbl> <dbl> <dbl>
# 1 4 2 1
# 2 4 10 0
# 3 7 4 1
# 4 7 22 0
# 5 8 16 0
cars %>%
mutate(low_dist = dist < 5) %>%
mutate(low_dist = as.numeric_psql(low_dist)) %>%
head(5)
# speed dist low_dist
# 1 4 2 1
# 2 4 10 0
# 3 7 4 1
# 4 7 22 0
# 5 8 16 0
However, it doesn't work when used on the remote data frame, since as.numeric_psql
is not in the list of sql translations, so is passed to the query verbatim:
> tmp_cars_sdf %>%
+ mutate(low_dist = dist < 5) %>%
+ mutate(low_dist = as.numeric_psql(low_dist))
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "as"
LINE 1: SELECT "speed", "dist", as.numeric_psql("low_dist") AS "low_...
^
)
My question is whether there exist a easy way (i.e. not defining a custom sql translation) of getting dplyr to understand that the function as.numeric_psql
is a composition of functions that have existing sql translations, and to use those translations instead.
回答1:
One way to avoid the error is to set up the function to operate on a data frame, rather than inside mutate. For example:
copy_to(con_psql, cars, 'tmp_cars')
tmp_cars_sdf <- tbl(con_psql, 'tmp_cars')
as.numeric_psql <- function(data, x) {
return(data %>% mutate({{x}} := as.numeric(as.integer({{x}}))))
}
tmp_cars_sdf %>%
mutate(low_dist = dist < 5) %>%
as.numeric_psql(low_dist)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#> speed dist low_dist
#> <dbl> <dbl> <dbl>
#> 1 4 2 1
#> 2 4 10 0
#> 3 7 4 1
#> 4 7 22 0
#> 5 8 16 0
#> 6 9 10 0
#> 7 10 18 0
#> 8 10 26 0
#> 9 10 34 0
#> 10 11 17 0
#> # … with more rows
Note that in your example, in the database version low_dist
already gets coded as integer when it's created, rather than as logical as it would be in a standard R data frame:
tmp_cars_sdf %>%
mutate(low_dist = dist < 5)
#> # Source: lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#> speed dist low_dist
#> <dbl> <dbl> <int>
#> 1 4 2 1
#> 2 4 10 0
#> 3 7 4 1
#> 4 7 22 0
#> 5 8 16 0
#> 6 9 10 0
#> 7 10 18 0
#> 8 10 26 0
#> 9 10 34 0
#> 10 11 17 0
#> # … with more rows
cars %>%
mutate(low_dist = dist < 5) %>% head
#> speed dist low_dist
#> 1 4 2 TRUE
#> 2 4 10 FALSE
#> 3 7 4 TRUE
#> 4 7 22 FALSE
#> 5 8 16 FALSE
#> 6 9 10 FALSE
来源:https://stackoverflow.com/questions/58211123/function-that-composes-functions-with-existing-sql-translations-in-dbplyr