R: Converting “special” letters into UTF-8?

后端 未结 2 1724
忘了有多久
忘了有多久 2021-01-12 18:21

I run into problems matching tables where one dataframe contains special characters and the other doesn\'t. Example: Doña Ana County vs. Dona Ana County

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-12 19:05

    The first problem is that acs::fips.place is badly mangled; if provides e.g., \\xf1a where it means \xf1a. A bug should be reported to the package mantainer. In the meantime, here is one work-around:

    tbl_df(acs::fips.place) %>%
        mutate(COUNTY = scan(text = str_c(COUNTY, collapse = "\n"),
                             sep = "\n",
                             what = "character",
                             allowEscapes = TRUE)) -> fp
    
    Encoding(fp$COUNTY) <- "latin1"
    
    fp %>%
        filter(COUNTY == "Doña Ana County")
    

    Once the escapes have been cleaned up you can transliterate non-ascii characters into ascii substitutions. The stringi package makes it easy:

    library(stringi)
    fp$COUNTY <- stri_trans_general(fp$COUNTY, "latin-ascii")
    
    fp %>%
        filter(COUNTY == "Dona Ana County") 
    

提交回复
热议问题