Is it possible to read characters from `io::stdin()` without caching input line-by-line?

问题

This question refers to the stable Rust version 1.2.0

I just want to iterate over the characters in the standard input of my CLI application. It's perfectly possible to do read stdin's read_line method into a temporary String instance and then iterate over it's chars() iterator.

But I don't like this approach, as it allocates a totally unnecessary String object. Stdin trait's documentations implements Read trait, which has chars() iterator, but it is marked as unstable (and thus can't be used with a stable compiler version).

Is there an alternative, possible less obvious way to read stdin char-by-char without any additional Rust-side buffering?

回答1:

You can do this by having a single byte array, and continuing to read till the Result becomes an Err. There is a problem with this however, as this can become if you're not reading in ASCII characters. If you are going to come with up against this problem, it would be better to just allocate a String, and use the chars iterator, as it handles this problem.

Sample code:

use std::io::{stdin, Read};

fn main() {
    loop {
        let mut character = [0];
        while let Ok(_) = stdin().read(&mut character) {
            println!("CHAR {:?}", character[0] as char);
        }
    }
}

Sample output:

Hello World
CHAR Some('H')
CHAR Some('e')
CHAR Some('l')
CHAR Some('l')
CHAR Some('o')
CHAR Some(' ')
CHAR Some('W')
CHAR Some('o')
CHAR Some('r')
CHAR Some('l')
CHAR Some('d')
CHAR Some('\n')
你好世界
CHAR Some('\u{e4}')
CHAR Some('\u{bd}')
CHAR Some('\u{a0}')
CHAR Some('\u{e5}')
CHAR Some('\u{a5}')
CHAR Some('\u{bd}')
CHAR Some('\u{e4}')
CHAR Some('\u{b8}')
CHAR Some('\u{96}')
CHAR Some('\u{e7}')
CHAR Some('\u{95}')
CHAR Some('\u{8c}')
CHAR Some('\n')

回答2:

Aaronepower's answer is correct for the case that you probably care about, ASCII characters. I want to address the question as you phrased it:

I just want to iterate over the characters in the standard input of my CLI application.

In Rust, a char is a 32-bit (4-byte) type that represents a Unicode codepoint. However, the IO abstraction operates on the level of bytes. You need to bring some kind of encoding that maps codepoints to sequences of bytes, and the current winner in that war is UTF-8.

UTF-8 will use a maximum of 4 bytes to represent a single codepoint, but in a different bit pattern than native. To properly read character-by-character, you will always need to have some kind of buffer.

Then there's the problem of having a partial character at the end of your buffer that needs to be moved back to the beginning of the buffer, which is comparatively expensive. The best solution there is to amortize the cost over many characters, thus why reading in larger chunks can be faster.

来源：https://stackoverflow.com/questions/32549784/is-it-possible-to-read-characters-from-iostdin-without-caching-input-line

标签

string

rust

stdin