Why do I need to collect into a vector when using `flat_map`?

时光怂恿深爱的人放手 提交于 2020-12-08 08:44:46

问题


I'm working on Project Euler 96 to teach myself Rust. I've written this code to read in the file and convert it into a vector of integers (Playground).

let file = File::open(&args[1]).expect("Sudoku file not found");
let reader = BufReader::new(file);

let x = reader
    .lines()
    .map(|x| x.unwrap())
    .filter(|x| !x.starts_with("Grid"))
    .flat_map(|s| s.chars().collect::<Vec<_>>())  // <-- collect here!
    .map(|x| x.to_digit(10).unwrap())
    .collect::<Vec<_>>();

This all works fine but I'm puzzled why I have to collect into a vector in my flat_map (I'm assuming creating unneeded vectors which will be immediately destroyed is inefficient). If I don't collect, it doesn't compile:

error[E0515]: cannot return value referencing function parameter `s`
  --> src/main.rs:13:23
   |
13 |         .flat_map(|s| s.chars())
   |                       -^^^^^^^^
   |                       |
   |                       returns a value referencing data owned by the current function
   |                       `s` is borrowed here

The example from the docs shows almost the same code, but a collect is not required:

let words = ["alpha", "beta", "gamma"];

// chars() returns an iterator
let merged: String = words.iter()
                          .flat_map(|s| s.chars())
                          .collect();
assert_eq!(merged, "alphabetagamma");

So why is it different in my code?


回答1:


The iterator reader.lines().map(|x| x.unwrap()) iterates over String items, i.e. by value. Consequently, in .flat_map(|s| ...), the variable s has the type String (i.e. owned, not borrowed). In other words: the string is a local variable now and lives in the function. It's a simple rule that you cannot return references to local variables (see this Q&A). But that's exactly what s.chars() does, even if it's a bit hidden.

Taking a look at str::chars:

pub fn chars(&self) -> Chars<'_>

One can see that the string is borrowed. The returned Chars object contains a reference to the original string. That's why we cannot return s.chars() from the closure.

So why is it different in my code?

In the documentation's example, the iterator words.iter() actually iterates over items of the type &&'static str. Calling s.chars() will also return a Chars object that borrows some string, but that string's lifetime is 'static (lives forever), so there is no problem with returning Chars from that closure.

Solution?

It would be great if the standard library had an OwnedChars iterator that consumes a String, works like Chars and drops the string once the iterator is dropped. In that case it's fine to call s.owned_chars() because the returned object does not reference the local s, but owns it. But: such an owned iterator does not exist in the standard library!

I'm assuming creating unneeded vectors which will be immediately destroyed is inefficient

Yes, that is true in a way. But you might have missed that the reader.lines() iterator also creates temporary objects of type String. Those are more or less immediately destroyed as well! So even without the collect in the flat_map you have a bunch of unnecessary allocations. Note that sometimes that's Ok. In this case, I guess that input parsing is very fast in comparison to the actual algorithm you have to implement. So ... just collect? It's probably fine in this case.

If you want to have high performance input parsing, I think you will not be able to avoid a standard loop, in particular in order to avoid unnecessary String allocations. (Playground)

let mut line = String::new();
let mut input = Vec::new();
loop {
    line.clear(); // clear contents, but keep memory buffer

    // TODO: handle IO error properly
    let bytes_read = reader.read_line(&mut line).expect("IO error"); 
    if bytes_read == 0 {
        break;
    }

    if line.starts_with("Grid") {
        continue;
    }

    // TODO: handle invalid input error
    input.extend(line.trim().chars().map(|c| c.to_digit(10).unwrap()));
}



回答2:


In addition to the other answer, note that an owned iterator is easy to write:

struct OwnedChars {
    s: String,
    i: usize,
}

impl Iterator for OwnedChars {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        let c = self.s[self.i..].chars().next()?;

        self.i += c.len_utf8();

        Some(c)
    }
}

fn into_iter(string: String) -> OwnedChars {
    OwnedChars {
        s: string,
        i: 0,
    }
}

fn main() {
    let owned_iter = into_iter("Zażółć gęślą jaźń".into());

    for c in owned_iter {
        println!("{}", c);
    }
}

And then, you don't need to collect:

fn main() {
    use std::io::prelude::*;

    let file = std::fs::File::open(std::env::args().nth(1).unwrap()).expect("Sudoku file not found");
    let reader = std::io::BufReader::new(file);

    let x = reader
        .lines()
        .map(|x| x.unwrap())
        .filter(|x| !x.starts_with("Grid"))
        .flat_map(into_iter)
        .map(|x| x.to_digit(10).unwrap())
        .collect::<Vec<_>>();
}


来源:https://stackoverflow.com/questions/57902478/why-do-i-need-to-collect-into-a-vector-when-using-flat-map

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!