发表新帖

发表新帖

How to replace Unicode characters with ASCII

前端未结

关注

 4  773

被撕碎了的回忆

I have the following command to replace Unicode characters with ASCII ones.

sed -i \'s/Ã/A/g\'

The problem is Ã isn\'t recognized

相关标签:

4条回答

没有蜡笔的小新

2021-02-15 18:43
You can use iconv:
```
iconv -f utf-8 -t ascii//translit
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-02-15 19:01
It is possible to use hex values in "sed".
```
echo "Ã" | hexdump -C
00000000  c3 83 0a                                          |...|
00000003
```
Ok, that character is two byte combination "c3 83". Let's replace it with single byte "A":
```
echo "Ã" |sed 's/\xc3\x83/A/g'
A
```
Explanation: \x indicates for "sed" that a hex code follows.
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2021-02-15 19:03

Try setting LANG=C and then run it over the Unicode range:
echo "hi ☠ there ☠" | LANG=C sed "s/[\x80-\xFF]//g"

0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2021-02-15 19:04
There is also uconv, from ICU.

Examples:
- uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;": to remove accents
- uconv -x "::Latin; ::Latin-ASCII;": for a transliteration latin/ascii
- uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;": for a transliteration latin/ascii and removal of remaining code points > 0x7F
- ...
echo "À l'école ☠" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;" gives: A l'ecole
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题