问题
I'm trying to parse a large file (tens of GB) streaming using Nom 5.0. One piece of the parser tries to parse numbers:
use nom::IResult;
use nom::character::streaming::{char, digit1};
// use nom::character::complete::{char, digit1};
use nom::combinator::{map, opt};
use nom::multi::many1;
use nom::sequence::{preceded, tuple};
pub fn number(input: &str) -> IResult<&str, &str> {
map(
tuple((
opt(char('-')),
many1(digit1),
opt(preceded(char('.'), many1(digit1)))
)),
|_| "0"
)(input)
}
(Obviously, it should not return "0" for all number; that's just to make the function as simple as possible.) For this parser, I wrote a test:
#[test]
fn match_positive_integer() {
let (_, res) = number("0").unwrap();
assert_eq!("0", res);
}
This test fails with Incomplete(Size(1))
because the "decimals" opt()
wants to read data and it isn't there. If I switch to the complete
versions of the matchers (as commented-out line), the test passes.
I assume this will actually work in production, because it will be fed additional data when complaining about incompleteness, but I would still like to create unit tests. Additionally, the issue would occur in production if a number happened to be the very last bit of input in a file. How do I convince a streaming Nom parser that there is no more data available?
回答1:
One can argue that the test in its original form is correct: The parser can't decide whether the given input is a number or not, so the parsing-result is in fact undecided yet. In production, especially when reading large files as you do, the buffer of already-read-but-to-be-parsed bytes might end right in between what could be a number unless it's actually not. Then, the parser needs to preserve its current state and ask for more input so it can retry/continue. Think of Incomplete
not as a final error but as I don't even know: This could be an error depending on the next byte, this problem is undecidable as of yet!
.
You can use the complete-combinator on your top-level parser so when you do in fact reach EOF
, you error out on that. Incomplete
-results within the top-level parser should be handled e.g. by expanding the read-buffer by some margin and retrying.
You can wrap the parser in a complete()
-parser local to the current unittest and test on that. Something to the tune of
#[test]
fn match_positive_integer() {
let (_, res) = complete(number("0")).unwrap();
assert_eq!("0", res);
}
来源:https://stackoverflow.com/questions/57914504/parsing-number-with-nom-5-0