问题
I am using the Bill Casarin post on how to parse delimited files with fparsec, I am dumbing the logic down to get an understanding of how the code works. I am parsing a multi row delimited document into Cell list list structure (for now) where a Cell is a string or a float. I am a complete newbie on this.
I am having issues parsing the floats - in a typical case (a cell delimitted by tabs, containing a numeric) it works. However when a cell happens to be a string that starts with a number - it falls apart.
How do I modify pFloatCell to either parse (although the way through the tab) as a float or nothing?
Thank you
type Cell =
| String of string
| Float of float
.
.
.
let pStringCell delim =
manyChars (nonQuotedCellChar delim)
|>> String
// this is my issue. pfloat parses the string one
// char at a time, and once it starts off with a number
// it is down that path, and errors out
let pFloatCell delim =
FParsec.CharParsers.pfloat
|>> Float
let pCell delim =
(pFloatCell delim) <|> (pStringCell delim)
.
.
.
let ParseTab s =
let delim = "\t"
let res = run (csv delim) s in
match res with
| Success (rows, _, _) -> { IsSuccess = true; ErrorMsg = "Ok"; Result = stripEmpty rows }
| Failure (s, _, _) -> { IsSuccess = false; ErrorMsg = s; Result = [[]] }
.
.
.
let test() =
let parsed = ParseTab data
oops late for me last night. I meant to post the data. This first one works
let data =
"s10 Mar 2011 18:28:11 GMT\n"
while this returns an error:
let data =
"10 Mar 2011 18:28:11 GMT\n"
returns, both with and witout ChaosP's recommendation:
ErrorMsg = "Error in Ln: 1 Col: 3\r\n10 Mar 2011 18:28:11 GMT\r\n ^\r\nExpecting: end of file, newline or '\t'\r\n"
It looks as though the attempt is working fine. in the second case it is only grabbing up to the 10 - and the code for pfloat looks only up to the first whitespace. I need to convice pfloat that it needs to look all the way up to the next tab or newline regardless of whether there is a space before it; write my own version of pfloat by performing a Double.Parse - but I would rather rely on the library.
回答1:
Since it seems the text you'll be parsing is a bit ambiguous you'll need to modify your pCell
parser.
let sep delim =
skipString delim <|> skipAnyOf "\r\n" <|> eof
let pCell delim =
attempt (pFloatCell delim .>> sep delim) <|> (pStringCell delim .>> sep delim)
This also means you'll need to modify whichever parser uses pCell
.
let pCells delim =
many pCell delim
Note
The .>>
operator is actually quite simple. Think of it like the leap-frog operator. The value of the left hand side is returned after applying the right hand side and ignoring the result.
Parser<'a, 'b> -> Parser<'c, 'b> -> Parser<'a, 'b>
来源:https://stackoverflow.com/questions/5630012/fparsec-how-to-parse-date-in-fparsec-newbie