What are the options to convert ISO-8859-1 / Latin-1 to a String (UTF-8)?

偶尔善良 提交于 2019-12-07 03:32:22

问题


I scanned the Rust documentation for some way to convert between character encodings but did not find anything. Did I miss something?

Is it supported (directly or indirectly) by the Rust language and its standard libraries or even planned to be in the near future?

As one of the answers suggested that there is an easy solution because u8 can be cast to (Unicode) chars. With Unicode being a superset of the codepoints in ISO-8859-1, thats a 1:1 mapping which encodes to multiple bytes in UTF-8 which is the internal encoding of Strings in Rust.

fn main() {
    println!("{}", 196u8 as char);
    println!("{}", (196u8 as char) as u8);
    println!("{}", 'Ä' as u8);
    println!("{:?}", 'Ä'.to_string().as_bytes());
    println!("{:?}", "Ä".as_bytes());
    println!("{}",'Ä' == 196u8 as char);
}

gives:

Ä
196
196
[195, 132]
[195, 132]
true

Which I had not even considered to work!


回答1:


Strings in Rust are unicode (UTF-8), and unicode codepoints are a superset of iso-8859-1 characters. This specific conversion is actually trivial.

fn latin1_to_string(s: &[u8]) -> String {
    s.iter().map(|&c| c as char).collect()
}

We interpret each byte as a unicode codepoint and then build a String from these codepoints.




回答2:


Standard library does not have any API to deal with encodings. Encodings, like date and time, are difficult to do right and need a lot of work, so they are not present in the std.

The crate to deal with encodings as of now is rust-encoding. You will almost certainly find everything you need there.



来源:https://stackoverflow.com/questions/28169745/what-are-the-options-to-convert-iso-8859-1-latin-1-to-a-string-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!