Using the same iterator multiple times in Rust

前端 未结 2 518
轻奢々
轻奢々 2020-11-28 15:19

Editor\'s note: This code example is from a version of Rust prior to 1.0 when many iterators implemented Copy. Updated versions of this code prod

相关标签:
2条回答
  • 2020-11-28 15:54

    As you identified, each take_while call is duplicating iter, since take_while takes self and the Peekable chars iterator is Copy. (Only true before Rust 1.0 — editor)

    You want to be modifying the iterator each time, that is, for take_while to be operating on an &mut to your iterator. Which is exactly what the .by_ref adaptor is for:

    pub fn split(input: &str) -> Vec<String> {
        let mut bits: Vec<String> = vec![];
        let mut iter = input.chars().peekable();
        loop {
            match iter.peek().map(|c| *c) {
                None => return bits,
                Some(c) => if c.is_digit(10) {
                    bits.push(iter.by_ref().take_while(|c| c.is_digit(10)).collect());
                } else {
                    bits.push(iter.by_ref().take_while(|c| !c.is_digit(10)).collect());
                },
            }
        }
    }
    
    fn main() {
        println!("{:?}", split("123abc456def"))
    }
    

    Prints

    ["123", "bc", "56", "ef"]
    

    However, I imagine this is not correct.

    I would actually recommend writing this as a normal for loop, using the char_indices iterator:

    pub fn split(input: &str) -> Vec<String> {
        let mut bits: Vec<String> = vec![];
        if input.is_empty() {
            return bits;
        }
    
        let mut is_digit = input.chars().next().unwrap().is_digit(10);
        let mut start = 0;
    
        for (i, c) in input.char_indices() {
            let this_is_digit = c.is_digit(10);
            if is_digit != this_is_digit {
                bits.push(input[start..i].to_string());
                is_digit = this_is_digit;
                start = i;
            }
        }
    
        bits.push(input[start..].to_string());
        bits
    }
    

    This form also allows for doing this with much fewer allocations (that is, the Strings are not required), because each returned value is just a slice into the input, and we can use lifetimes to state this:

    pub fn split<'a>(input: &'a str) -> Vec<&'a str> {
        let mut bits = vec![];
        if input.is_empty() {
            return bits;
        }
    
        let mut is_digit = input.chars().next().unwrap().is_digit(10);
        let mut start = 0;
    
        for (i, c) in input.char_indices() {
            let this_is_digit = c.is_digit(10);
            if is_digit != this_is_digit {
                bits.push(&input[start..i]);
                is_digit = this_is_digit;
                start = i;
            }
        }
    
        bits.push(&input[start..]);
        bits
    }
    

    All that changed was the type signature, removing the Vec<String> type hint and the .to_string calls.

    One could even write an iterator like this, to avoid having to allocate the Vec. Something like fn split<'a>(input: &'a str) -> Splits<'a> { /* construct a Splits */ } where Splits is a struct that implements Iterator<&'a str>.

    0 讨论(0)
  • 2020-11-28 16:06

    take_while takes self by value: it consumes the iterator. Before Rust 1.0 it also was unfortunately able to be implicitly copied, leading to the surprising behaviour that you are observing.

    You cannot use take_while for what you are wanting for these reasons. You will need to manually unroll your take_while invocations.

    Here is one of many possible ways of dealing with this:

    pub fn split(input: &str) -> Vec<String> {
        let mut bits: Vec<String> = vec![];
        let mut iter = input.chars().peekable();
        loop {
            let seeking_digits = match iter.peek() {
                None => return bits,
                Some(c) => c.is_digit(10),
            };
            if seeking_digits {
                bits.push(take_while(&mut iter, |c| c.is_digit(10)));
            } else {
                bits.push(take_while(&mut iter, |c| !c.is_digit(10)));
            }
        }
    }
    
    fn take_while<I, F>(iter: &mut std::iter::Peekable<I>, predicate: F) -> String
    where
        I: Iterator<Item = char>,
        F: Fn(&char) -> bool,
    {
        let mut out = String::new();
        loop {
            match iter.peek() {
                Some(c) if predicate(c) => out.push(*c),
                _ => return out,
            }
            let _ = iter.next();
        }
    }
    
    fn main() {
        println!("{:?}", split("test123test"));
    }
    

    This yields a solution with two levels of looping; another valid approach would be to model it as a state machine one level deep only. Ask if you aren’t sure what I mean and I’ll demonstrate.

    0 讨论(0)
提交回复
热议问题