How to parse GPX file using Haskell's xml-conduit?

心已入冬 提交于 2019-12-10 19:14:54

问题


I'd like to use xml-conduit to parse GPX files. So far I've got the following:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import Data.Text           as T
import Text.XML
import Text.XML.Cursor

data Trkpt = Trkpt {
  trkptLat :: Text,
  trkptLon :: Text,
  trkptEle :: Text,
  trkptTime :: Text
  } deriving (Show)

trkptsFromFile path =
  gpxTrkpts . fromDocument <$> Text.XML.readFile def path

gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
  child >=> \e -> do
    let ele  = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
    let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
    let lat  = T.concat $ attribute "lat" e
    let lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time

A sample GPX file is here.

I'm getting strange results where the parsed text is mostly empty, with some sporadic actual values, although the original GPX file data is all valid. When there is an actual value, it is only in one of the fields of the record.

I'm quite certain I'm not using the xml-conduit API properly. What am I doing wrong?


回答1:


Two issues. Firstly, there is a typo in the namespace; it should be http://www.topografix.com/GPX/1/1. Secondly, your final Kleisli arrow (\e -> do -- etc.) is acting on the children of the trkpt elements, rather than on the trkpt themselves. Here is a gpxTrkpts which should do what you want:

gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
  \e -> do
    let cs = child e
        ele  = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
        time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
        lat  = T.concat $ attribute "lat" e
        lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time



回答2:


@duplode has pointed out the problem. Here are some more comments.

  1. How about using the gpx-conduit package

  2. Here is some code which can help debug parsing problems:

Code:

{-# LANGUAGE OverloadedStrings #-}
module Lib2 where

import qualified Data.Text           as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad

showNode (NodeElement e)     = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t)     = "NodeContent " ++ show t
showNode (NodeComment _)     = "NodeComment"

testParser parser =  do
  content <- Text.XML.readFile def (Path.decodeString "sample.xml")
  let nodes = map node $ parser (fromDocument content)
  forM_ nodes $ \n -> putStrLn (showNode n)

Use it in ghci like this:

ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n  "
NodeEement metadata
NodeContent "\n  "
NodeEement trk
NodeContent "\n  "
NodeEement extensions
NodeContent "\n"

Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk

Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>


来源:https://stackoverflow.com/questions/31897737/how-to-parse-gpx-file-using-haskells-xml-conduit

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!