I run into problems matching tables where one dataframe contains special characters and the other doesn\'t. Example: Doña Ana County vs. Dona Ana County
The first problem is that acs::fips.place
is badly mangled; if provides e.g., \\xf1a
where it means \xf1a
. A bug should be reported to the package mantainer. In the meantime, here is one work-around:
tbl_df(acs::fips.place) %>%
mutate(COUNTY = scan(text = str_c(COUNTY, collapse = "\n"),
sep = "\n",
what = "character",
allowEscapes = TRUE)) -> fp
Encoding(fp$COUNTY) <- "latin1"
fp %>%
filter(COUNTY == "Doña Ana County")
Once the escapes have been cleaned up you can transliterate non-ascii characters into ascii substitutions. The stringi
package makes it easy:
library(stringi)
fp$COUNTY <- stri_trans_general(fp$COUNTY, "latin-ascii")
fp %>%
filter(COUNTY == "Dona Ana County")