I\'m on a search-and-destroy mission for anything Amazon finds distasteful. In the past I\'ve dealt with this by using iconv
to convert from \"UTF-8\" to \"latin1\
I looped a bit through iconvlist() and found this (among other combinations):
test<-"Gwena\xeblle M"
iconv(test,"CP1163","UTF-8")
[1] "Gwenaëlle M"
I realize, this is not what you asked for, but it might be possible to find the correct encoding.
I believe this pattern should work:
pat <- "[\x80-\xFF]"
test <- c("Gwena\xeblle M", "\x92","\xe4","\xe1","\xeb")
gsub(pat, "", test, perl=TRUE)
# [1] "Gwenalle M" "" "" "" ""
Explanation:
It works because the character class "[\x00-\xFF]"
would match all characters of the form \x##
. But the first half of those -- the 0th to 127th (or 00
'th to 7F
'th in hex digits) -- are the ASCII characters. So it's the second half of them -- the 128th to 255th (or 80
'th to FF
'th in hex mode) -- that you want to search out and destroy.