lexical error: invalid bytes in UTF8 string

回眸只為那壹抹淺笑 提交于 2020-02-26 02:50:13

问题


I am trying to use the code shown below to extract data from a json file. However, the following error is returned:

Error: lexical error: invalid bytes in UTF8 string.
          fr":"Ces données sont publiées avec un délai de cinq jours
                     (right here) ------^

Inspecting the json file in my browser shows that the data appears as such:

"fr":"Ces donn\u00e9es sont publi�es avec un d\u00e9lai de cinq jours."

Is there a way to write the data while ignoring any UTF8 strings that cause an error?

library(jsonlite)

URL <- paste0("https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json")

data <- fromJSON(getURL(URL))

回答1:


The problem is that the URL returns data in a latin1 encoding, and your system is defaulting to reading it as UTF-8. You can get it correctly using

library(jsonlite)
library(RCurl)  

URL <- "https://www.energy-charts.de/power_unit/month_lignite_unit_2017_12.json"

data <- fromJSON(getURL(URL, encoding = "latin1"))

I've also corrected some minor errors in your code: you forgot to request RCurl, and paste0 was not needed.



来源:https://stackoverflow.com/questions/54627177/lexical-error-invalid-bytes-in-utf8-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!