How to get a '&str' from a NUL-terminated byte slice if the NUL terminator isn't at the end of the slice?

后端 未结 3 1326
心在旅途
心在旅途 2021-01-19 01:02

While CStr is typically used for FFI, I am reading from a &[u8] which is NUL-terminated and is ensured to be valid UTF-8 so no checks are neede

3条回答
  •  生来不讨喜
    2021-01-19 01:20

    Three possible other ways of doing this, mostly using only functions from std.

    use std::ffi::CStr;
    use std::str;
    
    fn str_from_null_terminated_utf8_safe(s: &[u8]) -> &str {
        if s.iter().any(|&x| x == 0) {
            unsafe { str_from_null_terminated_utf8(s) }
        } else {
            str::from_utf8(s).unwrap()
        }
    }
    
    // unsafe: s must contain a null byte
    unsafe fn str_from_null_terminated_utf8(s: &[u8]) -> &str {
        CStr::from_ptr(s.as_ptr() as *const _).to_str().unwrap()
    }
    
    // unsafe: s must contain a null byte, and be valid utf-8
    unsafe fn str_from_null_terminated_utf8_unchecked(s: &[u8]) -> &str {
        str::from_utf8_unchecked(CStr::from_ptr(s.as_ptr() as *const _).to_bytes())
    }
    

    As a slight aside: benchmark results for all the options in this thread:

    With s = b"\0"

    test dtwood::bench_str_from_null_terminated_utf8           ... bench:           9 ns/iter (+/- 0)
    test dtwood::bench_str_from_null_terminated_utf8_safe      ... bench:          10 ns/iter (+/- 3)
    test dtwood::bench_str_from_null_terminated_utf8_unchecked ... bench:           5 ns/iter (+/- 1)
    test ideasman42::bench_str_from_u8_nul_utf8_unchecked      ... bench:           1 ns/iter (+/- 0)
    test ker::bench_str_from_u8_nul_utf8                       ... bench:           4 ns/iter (+/- 0)
    test ker::bench_str_from_u8_nul_utf8_unchecked             ... bench:           1 ns/iter (+/- 0)
    

    with s = b"abcdefghij\0klmnop"

    test dtwood::bench_str_from_null_terminated_utf8           ... bench:          15 ns/iter (+/- 2)
    test dtwood::bench_str_from_null_terminated_utf8_safe      ... bench:          20 ns/iter (+/- 2)
    test dtwood::bench_str_from_null_terminated_utf8_unchecked ... bench:           6 ns/iter (+/- 0)
    test ideasman42::bench_str_from_u8_nul_utf8_unchecked      ... bench:           7 ns/iter (+/- 0)
    test ker::bench_str_from_u8_nul_utf8                       ... bench:          15 ns/iter (+/- 2)
    test ker::bench_str_from_u8_nul_utf8_unchecked             ... bench:           5 ns/iter (+/- 0)
    

    with s = b"abcdefghij" * 512 + "\0klmnopqrs"

    test dtwood::bench_str_from_null_terminated_utf8           ... bench:         351 ns/iter (+/- 35)
    test dtwood::bench_str_from_null_terminated_utf8_safe      ... bench:       1,987 ns/iter (+/- 274)
    test dtwood::bench_str_from_null_terminated_utf8_unchecked ... bench:         170 ns/iter (+/- 18)
    test ideasman42::bench_str_from_u8_nul_utf8_unchecked      ... bench:       2,466 ns/iter (+/- 292)
    test ker::bench_str_from_u8_nul_utf8                       ... bench:       1,971 ns/iter (+/- 209)
    test ker::bench_str_from_u8_nul_utf8_unchecked             ... bench:       1,828 ns/iter (+/- 205)
    

    So if you're super concerned about performance, probably best to benchmark with your particular data set - dtwood::str:from_null_terminated_utf8_unchecked seems to perform better with longer strings, but ker::bench_str_from_u8_nul_utf8_unchecked does better on small (< 20 character) strings.

提交回复
热议问题