Parsing number with nom 5.0

问题

I'm trying to parse a large file (tens of GB) streaming using Nom 5.0. One piece of the parser tries to parse numbers:

use nom::IResult;
use nom::character::streaming::{char, digit1};
// use nom::character::complete::{char, digit1};
use nom::combinator::{map, opt};
use nom::multi::many1;
use nom::sequence::{preceded, tuple};

pub fn number(input: &str) -> IResult<&str, &str> {
    map(
        tuple((
            opt(char('-')),
            many1(digit1),
            opt(preceded(char('.'), many1(digit1)))
        )),
        |_| "0"
    )(input)
}

(Obviously, it should not return "0" for all number; that's just to make the function as simple as possible.) For this parser, I wrote a test:

#[test]
fn match_positive_integer() {
    let (_, res) = number("0").unwrap();
    assert_eq!("0", res);
}

This test fails with Incomplete(Size(1)) because the "decimals" opt() wants to read data and it isn't there. If I switch to the complete versions of the matchers (as commented-out line), the test passes.

I assume this will actually work in production, because it will be fed additional data when complaining about incompleteness, but I would still like to create unit tests. Additionally, the issue would occur in production if a number happened to be the very last bit of input in a file. How do I convince a streaming Nom parser that there is no more data available?

回答1:

One can argue that the test in its original form is correct: The parser can't decide whether the given input is a number or not, so the parsing-result is in fact undecided yet. In production, especially when reading large files as you do, the buffer of already-read-but-to-be-parsed bytes might end right in between what could be a number unless it's actually not. Then, the parser needs to preserve its current state and ask for more input so it can retry/continue. Think of Incomplete not as a final error but as I don't even know: This could be an error depending on the next byte, this problem is undecidable as of yet!.

You can use the complete-combinator on your top-level parser so when you do in fact reach EOF, you error out on that. Incomplete-results within the top-level parser should be handled e.g. by expanding the read-buffer by some margin and retrying.

You can wrap the parser in a complete()-parser local to the current unittest and test on that. Something to the tune of

#[test]
fn match_positive_integer() {
    let (_, res) = complete(number("0")).unwrap();
    assert_eq!("0", res);
}

来源：https://stackoverflow.com/questions/57914504/parsing-number-with-nom-5-0

标签

rust

streaming

nom