XML parsing in Python [closed]

空扰寡人 提交于 2019-12-17 19:15:10

问题


I'd like to parse a simple, small XML file using python however work on pyXML seems to have ceased. I'd like to use python 2.6 if possible. Can anyone recommend an XML parser that will work with 2.6?

Thanks


回答1:


If it's small and simple then just use the standard library:

from xml.dom.minidom import parse
doc = parse("filename.xml")

This will return a DOM tree implementing the standard Document Object Model API

If you later need to do complex things like schema validation or XPath querying then I recommend the third-party lxml module, which is a wrapper around the popular libxml2 C library.




回答2:


For most of my tasks I have used the Minidom Lightweight DOM implementation, from the official page:

from xml.dom.minidom import parse, parseString

dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name

datasource = open('c:\\temp\\mydata.xml')
dom2 = parse(datasource)   # parse an open file

dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>')



回答3:


Here is also a very good example on how to use minidom along with explanations.




回答4:


Would lxml suit your needs? Its the first tool I turn to for xml parsing.




回答5:


A few years ago, I wrote a library for working with structured XML. It makes XML simpler by making some limiting assumptions.

You could use XML for something like a word processor document, in which case you have a complicated soup of stuff with XML tags embedded all over the place; in which case my library would not be good.

But if you are using XML for something like a config file, my library is rather convenient. You define classes that describe the structure of the XML you want, and once you have the classes done, there is a method to slurp in XML and parse it. The actual parsing is done by xml.dom.minidom, but then my library extracts the data and puts it in the classes.

The best part: you can declare a "Collection" type that will be a Python list with zero or more other XML elements inside it. This is great for things like Atom or RSS feeds (which was the original reason I designed the library).

Here's the URL: http://home.avvanta.com/~steveha/xe.html

I'd be happy to answer questions if you have any.



来源:https://stackoverflow.com/questions/1373707/xml-parsing-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!