minidom | 易学教程

using minidom to parse xml

阅读更多关于 using minidom to parse xml

问题 Hi I have trouble understanding the minidom module for Python. I have xml that looks like this: <Show> <name>Dexter</name> <totalseasons>7</totalseasons> <Episodelist> <Season no="1"> <episode> <epnum>1</epnum> <seasonnum>01</seasonnum> <prodnum>101</prodnum> <airdate>2006-10-01</airdate> <link>http://www.tvrage.com/Dexter/episodes/408409</link> <title>Dexter</title> </episode> <episode> <epnum>2</epnum> <seasonnum>02</seasonnum> <prodnum>102</prodnum> <airdate>2006-10-08</airdate> <link>http

Add to origonal xml file from for loop in python

阅读更多关于 Add to origonal xml file from for loop in python

问题 I have a master xml file called vs_origonal_M.xml I want to add all types of a certain child <location> </location> <location> </location> . . . <location> </location> until all the files are looked at. I am doing this by first opening the directory, next I am making a list of all the files in the directory and checking to see if they are indeed xml files, then I am taking a certain child out. Then (Here's where I am stuck) I need to open the master file and insert this child right under the

How to get whole text of an Element in xml.minidom?

阅读更多关于 How to get whole text of an Element in xml.minidom?

问题 I want to get the whole text of an Element to parse some xhtml: <div id='asd'> <pre>skdsk</pre> </div> begin E = div element on the above example, I want to get <pre>skdsk</pre> How? 回答1: Strictly speaking: from xml.dom.minidom import parse, parseString tree = parseString("<div id='asd'><pre>skdsk</pre></div>") root = tree.firstChild node = root.childNodes[0] print node.toxml() In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library. Finding the

OverflowError: size does not fit in an int while parsing big XML with DOM

阅读更多关于 OverflowError: size does not fit in an int while parsing big XML with DOM

问题 I have a pretty big XML file and I need to get all the nodes (different companies information) that contain a specific parameter. XML is about 12 GB unpacked. <Companies xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ...> <Company id="782634892" source="abcd"> <attribution>abcde</attribution> <name xml:lang="en">company name</name> <Phones> <Phone type="phone" hide="0"> <formatted>+1800111</formatted> <country>1</country> <prefix>800</prefix> <number>111</number> </Phone> </Phones>

Python: Using minidom to search for nodes with a certain text

阅读更多关于 Python: Using minidom to search for nodes with a certain text

问题 I am currently faced with XML that looks like this: <ID>345754</ID> This is contained within a hierarchy. I have parsed the xml, and wish to find the ID node by searching on "345754". 回答1: xmldoc = minidom.parse('your.xml') matchingNodes = [node for node in xmldoc.getElementsByTagName("id") if node.nodeValue == '345754'] See also: How to get whole text of an Element in xml.minidom? All nodeValue fields are None when parsing XML 回答2: vartec's answer needs correcting (sorry I'm not sure I can

Python minidom and UTF-8 encoded XML with hash references

阅读更多关于 Python minidom and UTF-8 encoded XML with hash references

问题 I am experiencing some difficulty in my home project where I need to parse a SOAP request. The SOAP is generated with gSOAP and involves string parameters with special characters like the danish letters "æøå". gSOAP builds SOAP requests with UTF-8 encoding by default, but instead of sending the special chatacters in raw format (ie. bytes C3A6 for the special character "æ") it sends what I think is called character hash references (ie. Ã¦). I don't completely understand why gSOAP does it this

Python minidom: #text node disappears when appending it to new parent node

阅读更多关于 Python minidom: #text node disappears when appending it to new parent node

问题 I have XML that looks like this: <example> <para> <phrase>child_0</phrase> child_1 <phrase>child_2</phrase> </para> </example> and I want it to look like this: <foo> <phrase>child_0</phrase> child_1 <phrase>child_2</phrase> </foo> Simple, right? I create a new parent node -- <foo> -- and then iterate through the <para> node and append the children to the new <foo> node. What's strange is that the child_1 (a text node) disappears when I try to do so. If I simply iterate through the <para> node

Parsing document with python minidom

阅读更多关于 Parsing document with python minidom

问题 I have the following XML document that I have to parse using python's minidom: <?xml version="1.0" encoding="UTF-8"?> <root> <bash-function activated="True"> <name>lsal</name> <description>List directory content (-al)</description> <code>ls -al</code> </bash-function> <bash-function activated="True"> <name>lsl</name> <description>List directory content (-l)</description> <code>ls -l</code> </bash-function> </root> Here is the code (the essential part) where I am trying to parse: from modules

Using urllib and minidom to fetch XML data

阅读更多关于 Using urllib and minidom to fetch XML data

问题 I'm trying to fetch data from a XML service... this one. http://xmlweather.vedur.is/?op_w=xml&type=forec&lang=is&view=xml&ids=1 I'm using urrlib and minidom and i can't seem to make it work. I've used minidom with files and not url. This is the code im trying to use xmlurl = 'http://xmlweather.vedur.is' xmlpath = xmlurl + '?op_w=xml&type=forec&lang=is&view=xml&ids=' + str(location) xmldoc = minidom.parse(urllib.urlopen(xmlpath)) Can anyone help me? 回答1: The following should work (or at least

Python minidom and UTF-8 encoded XML with hash references

阅读更多关于 Python minidom and UTF-8 encoded XML with hash references

I am experiencing some difficulty in my home project where I need to parse a SOAP request. The SOAP is generated with gSOAP and involves string parameters with special characters like the danish letters "æøå". gSOAP builds SOAP requests with UTF-8 encoding by default, but instead of sending the special chatacters in raw format (ie. bytes C3A6 for the special character "æ") it sends what I think is called character hash references (ie. Ã¦). I don't completely understand why gSOAP does it this way as I can see that it has marked the incomming payload as being UTF-8 encoded anyway (Content-Type: