Can dplyr for SQL translate == NA to IS NULL in filters?

问题

I'm trying to use dplyr to query a SQL database, matching on provided arguments.

  id <- tbl(conn, "My_Table") %>%
    filter(Elem1 == elem1 & Elem2 == elem2 & Elem3 == elem3) %>%
    select(Id) %>%
    collect()

However, it's possible that any of elem1, elem2, or elem3 might be NA. Ideally, I'd like the query to translate them to the SQL IS NULL statement.

For example, if elem1 is 1, elem2 is NA, and elem3 is 3, I'd like the translated query to be:

SELECT Id FROM My_Table WHERE Elem1 == 1 AND Elem2 IS NULL AND Elem3 == 3

However, my code above converts the where clause to ... AND Elem2 == NULL ... which obviously doesn't do what I want. Is there a nice way to solve this problem?

回答1:

Assuming you are in SQL-server you can bypass this using COALESCE like so:

filler_value = -1

id <- tbl(conn, "My_Table") %>%
    mutate(Elem1 = COALESCE(Elem1, filler_value),
           Elem2 = COALESCE(Elem2, filler_value),
           Elem3 = COALESCE(Elem3, filler_value)) %>%
    filter(Elem1 == COALESCE(elem1, filler_value),
           Elem2 == COALESCE(elem2, filler_value),
           Elem3 == COALESCE(elem3, filler_value)) %>%
    select(Id) %>%
    collect()

Where filler_value is chosen so that it is of the same data type (text/numeric/date) as your dataset columns, but is not a value that presently appears in your dataset columns.

The COALESCE function returns the first non-null value from its list of arguments. So first we replace NULL in the Elem_ columns with a place holder, and then we replace NULL in the elem_ values with the same placeholder. Hence a standard == comparison makes sense.

One of the key ideas here, is that as COALESCE does not have an R to SQL translation defined, it gets left when the R code is translated to SQL. See this question for more details/an alterantive.

来源：https://stackoverflow.com/questions/55453060/can-dplyr-for-sql-translate-na-to-is-null-in-filters

标签

dplyr

dbplyr