问题
I'm trying to use dplyr to query a SQL database, matching on provided arguments.
id <- tbl(conn, "My_Table") %>%
filter(Elem1 == elem1 & Elem2 == elem2 & Elem3 == elem3) %>%
select(Id) %>%
collect()
However, it's possible that any of elem1
, elem2
, or elem3
might be NA. Ideally, I'd like the query to translate them to the SQL IS NULL
statement.
For example, if elem1
is 1, elem2
is NA, and elem3
is 3, I'd like the translated query to be:
SELECT Id FROM My_Table WHERE Elem1 == 1 AND Elem2 IS NULL AND Elem3 == 3
However, my code above converts the where clause to ... AND Elem2 == NULL ...
which obviously doesn't do what I want. Is there a nice way to solve this problem?
回答1:
Assuming you are in SQL-server you can bypass this using COALESCE
like so:
filler_value = -1
id <- tbl(conn, "My_Table") %>%
mutate(Elem1 = COALESCE(Elem1, filler_value),
Elem2 = COALESCE(Elem2, filler_value),
Elem3 = COALESCE(Elem3, filler_value)) %>%
filter(Elem1 == COALESCE(elem1, filler_value),
Elem2 == COALESCE(elem2, filler_value),
Elem3 == COALESCE(elem3, filler_value)) %>%
select(Id) %>%
collect()
Where filler_value
is chosen so that it is of the same data type (text/numeric/date) as your dataset columns, but is not a value that presently appears in your dataset columns.
The COALESCE
function returns the first non-null value from its list of arguments. So first we replace NULL
in the Elem_
columns with a place holder, and then we replace NULL
in the elem_
values with the same placeholder. Hence a standard ==
comparison makes sense.
One of the key ideas here, is that as COALESCE
does not have an R to SQL translation defined, it gets left when the R code is translated to SQL. See this question for more details/an alterantive.
来源:https://stackoverflow.com/questions/55453060/can-dplyr-for-sql-translate-na-to-is-null-in-filters