How can I find a subsequence in a &[u8] slice?

后端 未结 4 1184
耶瑟儿~
耶瑟儿~ 2020-12-03 20:45

I have a &[u8] slice over a binary buffer. I need to parse it, but a lot of the methods that I would like to use (such as str::find) don\'t see

相关标签:
4条回答
  • 2020-12-03 21:23

    I don't think the standard library contains a function for this. Some libcs have memmem, but at the moment the libc crate does not wrap this. You can use the twoway crate however. rust-bio implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)

    0 讨论(0)
  • 2020-12-03 21:29

    I found the memmem crate useful for this task:

    use memmem::{Searcher, TwoWaySearcher};
    
    let search = TwoWaySearcher::new("dog".as_bytes());
    assert_eq!(
        search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
        Some(41)
    );
    
    0 讨论(0)
  • 2020-12-03 21:32

    Here's a simple implementation based on the windows iterator.

    fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
        haystack.windows(needle.len()).position(|window| window == needle)
    }
    
    fn main() {
        assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
        assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
    }
    

    The find_subsequence function can also be made generic:

    fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
        where for<'a> &'a [T]: PartialEq
    {
        haystack.windows(needle.len()).position(|window| window == needle)
    }
    
    0 讨论(0)
  • 2020-12-03 21:33

    How about Regex on bytes? That looks very powerful. See this Rust playground demo.

    extern crate regex;
    
    use regex::bytes::Regex;
    
    fn main() {
        //see https://doc.rust-lang.org/regex/regex/bytes/
    
        let re = Regex::new(r"say [^,]*").unwrap();
    
        let text = b"say foo, say bar, say baz";
    
        // Extract all of the strings without the null terminator from each match.
        // The unwrap is OK here since a match requires the `cstr` capture to match.
        let cstrs: Vec<usize> =
            re.captures_iter(text)
              .map(|c| c.get(0).unwrap().start())
              .collect();
    
        assert_eq!(cstrs, vec![0, 9, 18]);
    }
    
    0 讨论(0)
提交回复
热议问题