问题
I'm trying to get my head around HXT, a Haskell library for parsing XML that uses arrows. For my specific use case I'd rather not use deep
as there are cases where <outer_tag><payload_tag>value</payload_tag></outer_tag>
is distinct from <outer_tag><inner_tag><payload_tag>value</payload_tag></inner_tag></outer_tag>
but I ran into some weirdness that felt like it should work but doesn't.
I've managed to come up with a test case based on this example from the docs:
{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
module Main where
import Text.XML.HXT.Core
data Guest = Guest { firstName, lastName :: String }
deriving (Show, Eq)
getGuest = deep (isElem >>> hasName "guest") >>>
proc x -> do
fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
returnA -< Guest { firstName = fname, lastName = lname }
getGuest' = deep (isElem >>> hasName "guest") >>>
proc x -> do
fname <- getText <<< getChildren <<< (hasName "fname") <<< getChildren -< x
lname <- getText <<< getChildren <<< (hasName "lname") <<< getChildren -< x
returnA -< Guest { firstName = fname, lastName = lname }
getGuest'' = deep (isElem >>> hasName "guest") >>> getChildren >>>
proc x -> do
fname <- getText <<< getChildren <<< (hasName "fname") -< x
lname <- getText <<< getChildren <<< (hasName "lname") -< x
returnA -< Guest { firstName = fname, lastName = lname }
driver finalArrow = runX (readDocument [withValidate no] "guestbook.xml" >>> finalArrow)
main = do
guests <- driver getGuest
print "getGuest"
print guests
guests' <- driver getGuest'
print "getGuest'"
print guests'
guests'' <- driver getGuest''
print "getGuest''"
print guests''
Between getGuest
and getGuest'
I expand deep
into the correct number of getChildren
. The resulting function still works. I then factor the getChildren
outside of the do
block but this causes the resulting function to fail. The output is:
"getGuest"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest'"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest''"
[]
I feel like this should be a valid transformation to perform, but my understanding of arrows is a little shaky. Am I doing something wrong? Is this a bug that I should report?
I'm using HXT version 9.3.1.3 (the latest at the time of writing). ghc --version prints "The Glorious Glasgow Haskell Compilation System, version 7.4.1". I've also tested on a box with ghc 7.6.3 and got the same result.
The XML file had the following repetitive structure (the full file can be found here)
<guestbook>
<guest>
<fname>John</fname>
<lname>Steinbeck</lname>
</guest>
<guest>
<fname>Henry</fname>
<lname>Ford</lname>
</guest>
<guest>
<fname>Andrew</fname>
<lname>Carnegie</lname>
</guest>
</guestbook>
回答1:
In getGuest''
you have
... (hasName "fname") -< x
... (hasName "lname") -< x
That is, you are restricting to the case where x
is "fname"
and x
is "lname"
, which isn't satisfied by any x
!
回答2:
I've managed to work out the specific reason that the construction is interpreted the way it is. The following arrow translation found here provides a base to work from
addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = proc x -> do
y <- f -< x
z <- g -< x
returnA -< y + z
Becomes:
addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = arr (\ x -> (x, x)) >>>
first f >>> arr (\ (y, x) -> (x, y)) >>>
first g >>> arr (\ (z, y) -> y + z)
From this we can, by analogy, derive:
getGuest''' = preproc >>>
arr (\ x -> (x, x)) >>>
first f >>> arr (\ (y, x) -> (x, y)) >>>
first g >>> arr (\ (z, y) -> Guest {firstName = z, lastName = y})
where preproc = deep (isElem >>> hasName "guest") >>> getChildren
f = getText <<< getChildren <<< (hasName "fname")
g = getText <<< getChildren <<< (hasName "lname")
In HXT, the arrows can be imagined as streams of values flowing through filters. arr (\x->(x,x))
does not "split the stream", as I'd hoped. Instead it creates a stream of tuples that are filtered by f
and survivors are filtered by g
. As f
and g
are mutually exclusive, there are no survivors.
Examples with getChildren
inside miraculously worked because the tuple stream contained values from further up the XML document looking something like
<guest>
<fname>John</fname>
<lname>Steinbeck</lname>
</guest>
and so were not mutually exclusive.
来源:https://stackoverflow.com/questions/21995888/is-factoring-an-arrow-out-of-arrow-do-notation-a-valid-transformation