How can I find out the internal code representation of a WINDOWS-1252 character?

前端未结

关注

 3  1788

温柔的废话 2021-01-05 00:54

I am processing SPSS data from a questionnaire that must have originated in M$ Word. Word automatically changes hyphens into long hyphens, and gets converted into character

3条回答

礼貌的吻别 (楼主)

2021-01-05 01:06

After some head-scratching, lots of reading help files and trial-and-error, I created two little functions that does what I need. These functions work by converting their input into UTF-8 and then returning the integer vector for the UTF-8 encoded character vector, and vice versa.

# Convert character to integer vector
# Optional encoding specifies encoding of x, defaults to current locale
encToInt <- function(x, encoding=localeToCharset()){
    utf8ToInt(iconv(x, encoding, "UTF-8"))
}

# Convert integer vector to character vector
# Optional encoding specifies encoding of x, defaults to current locale
intToEnc <- function(x, encoding=localeToCharset()){
    iconv(intToUtf8(x), "utf-8",  encoding)
}

Some examples:

x <- "\xfa"
encToInt(x)
[1] 250

intToEnc(250)
[1] "ú"

0 讨论(0)

查看其它3个回答