问题
The task is find particular key-value pairs and parse them. The pairs can occur in any order. My partially working attempt:
open FParsec
type Parser<'a> = Parser<'a, unit>
type Status = Running | Done
type Job =
{ Id: int
Status: Status
Count: int }
let ws = spaces
let jobId: Parser<int> = ws >>. skipStringCI "Job id" >>. ws >>. skipChar '=' >>. ws >>. pint32
let status: Parser<Status> =
ws >>. skipStringCI "Status" >>. ws >>. skipChar '=' >>. ws >>. (
(skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done))
let count: Parser<int> = ws >>. skipStringCI "Count" >>. ws >>. skipChar '=' >>. ws >>. pint32
let parse: Parser<Job> = parse {
do! skipCharsTillStringCI "Job id" false 1000
let! id = jobId
do! skipCharsTillStringCI "Status" false 1000
let! status = status
do! skipCharsTillStringCI "Count" false 1000
let! count = count
return { Id = id; Status = status; Count = count }}
[<EntryPoint>]
let main argv =
let sample = """
Some irrelevant text.
Job id = 33
Some other text.
Status = Done
And another text.
Count = 10
Trailing text.
"""
printfn "%A" (run parse sample)
0
(*
result:
Success: {Id = 33;
Status = Done;
Count = 10;}
*)
So, it works but it has two problems: obvious duplication ("Job id" in jobId function and "Job id" in the top-level parser and so on), and it expects "Job id", "Status" and "Count" to be sequenced in this particular order, which is wrong by the requirement.
I have a strong feeling that there's an elegant solution for this.
Thanks!
回答1:
The first problem (duplication) can be solved with a minor refactoring. The basic idea is wrapping each parser into a wrapper that would do skipping.
Note that this code is yet far from perfection, I just tried to make refactoring as small as possible.
let jobId: Parser<int> = pint32
let status: Parser<Status> =
(skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done)
let count: Parser<int> = pint32
let skipAndParse prefix parser =
skipCharsTillStringCI prefix false 1000
>>. ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser
let parse: Parser<Job> = parse {
let! id = skipAndParse "Job id" jobId
let! status = skipAndParse "Status" status
let! count = skipAndParse "Count" count
return { Id = id; Status = status; Count = count }}
The second problem is more complicated. If you want the data lines to appear in a free order, you must consider the case when
- not all data lines present;
- a certain data line appears twice or more;
To mitigate this, you need to produce a list
of data lines found, analyze if everything required is there, and decide what to do with any possible duplicates.
Note that each data line can not afford to have "skip" part anymore, since it may skip an informative line before the actual parser.
let skipAndParse2 prefix parser =
ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser
// Here, you create a DU that will say which data line was found
type Result =
| Id of int
| Status of Status
| Count of int
| Irrelevant of string
// here's a combinator parser
let parse2 =
// list of possible data line parsers
// Note they are intentionally reordered
[
skipAndParse2 "Count" count |>> Count
skipAndParse2 "Status" status |>> Status
skipAndParse2 "Job id" jobId |>> Id
// the trailing one would skip a line in case if it has not
// been parsed by any of prior parsers
// a guard rule is needed because of specifics of
// restOfLine behavior at the end of input: namely, it would
// succeed without consuming an input, which leads
// to an infinite loop. Actually FParsec handles this and
// raises an exception
restOfLine true .>> notFollowedByEof |>> Irrelevant
]
|> List.map attempt // each parser is optional
|> choice // on each iteration, one of the parsers must succeed
|> many // a loop
Running the code:
let sample = "
Some irrelevant text.\n\
Job id = 33\n\
Some other text.\n\
Status = Done\n\
And another text.\n\
Count = 10\n\
Trailing text.\n\
"
sample |> run parse2 |> printfn "%A "
will produce the following output:
Success: [Irrelevant ""; Irrelevant "Some irrelevant text."; Id 33;
Irrelevant ""; Irrelevant "Some other text."; Status Done; Irrelevant "";
Irrelevant "And another text."; Count 10; Irrelevant ""]
Further processing requires filtering Irrelevant
elements, checking for duplicates or missing items, and forming the Job
record, or raising errors.
UPDATE: a simple example of further processing to hide out Result
and returning Job option
instead:
// naive implementation of the record maker
// return Job option
// ignores duplicate fields (uses the first one)
// returns None if any field is missing
let MakeJob arguments =
let a' =
arguments
|> List.filter (function |Irrelevant _ -> false | _ -> true)
try
let theId = a' |> List.pick (function |Id x -> Some x | _ -> None)
let theStatus = a' |> List.pick (function |Status x -> Some x | _ -> None)
let theCount = a' |> List.pick (function |Count x -> Some x | _ -> None)
Some { Id=theId; Status = theStatus; Count = theCount }
with
| :?System.Collections.Generic.KeyNotFoundException -> None
To use it, simply add the following line to the code of parse2
:
|>> MakeJob
来源:https://stackoverflow.com/questions/20018453/fparsec-how-to-combine-parsers-so-that-they-will-be-matched-in-arbitrary-order