Scala: XML Attribute parsing

后端 未结 4 1388
误落风尘
误落风尘 2021-01-02 02:02

I\'m trying to parse a rss feed that looks like this for the attribute \"date\":



    
        

        
相关标签:
4条回答
  • 2021-01-02 02:32

    The "y" in <y:c is a namespace prefix. It's not part of the name. Also, attributes are referred to with a '@'. Try this:

    println(((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").toString))
    
    0 讨论(0)
  • 2021-01-02 02:34

    Also, think about the difference between \ and \\. \\ looks for a descendent, not just a child, like this (note that it jumps from channel to c, without item):

    scala> (rssFeed \\ "channel" \\ "c" \ "@date").text
    res20: String = AA
    

    Or this sort of thing if you just want all the < c > elements, and don't care about their parents:

    scala> (rssFeed \\ "c" \ "@date").text            
    res24: String = AA
    

    And this specifies an exact path:

    scala> (rssFeed \ "channel" \ "item" \ "c" \ "@date").text
    res25: String = AA
    
    0 讨论(0)
  • 2021-01-02 02:35

    Think about using sequence comprehensions, too. They're useful for dealing with XML, particularly if you need complicated conditions.

    For the simple case:

    for {
      c <- rssFeed \\ "@date"
    } yield c
    

    Gives you the date attribute from everything in rssFeed.

    But if you want something more complex:

    val rssFeed = <rss version="2.0">
                    <channel>
                      <item>
                        <y:c date="AA"></y:c>
                        <y:c date="AB"></y:c>
                        <y:c date="AC"></y:c>
                      </item>
                    </channel>
                  </rss>
    
    val sep = "\n----\n"
    
    for {
      channel <- rssFeed \ "channel"
      item <- channel \ "item"
      y <- item \ "c"
      date <- y \ "@date" if (date text).equals("AA")
    } yield {
      val s = List(channel, item, y, date).mkString(sep)
      println(s)
    }
    

    Gives you:

        <channel>
                            <item>
                              <y:c date="AA"></y:c>
                              <y:c date="AB"></y:c>
                              <y:c date="AC"></y:c>
                            </item>
                          </channel>
        ----
        <item>
                              <y:c date="AA"></y:c>
                              <y:c date="AB"></y:c>
                              <y:c date="AC"></y:c>
                            </item>
        ----
        <y:c date="AA"></y:c>
        ----
        AA
    
    0 讨论(0)
  • 2021-01-02 02:45

    Attributes are retrieved using the "@attrName" selector. Thus, your selector should actually be something like the following:

    println((rssFeed \\ "channel" \\ "item" \ "c" \ "@date").text)
    
    0 讨论(0)
提交回复
热议问题