I working on xml sax parser to parse xml files and below is my code
xml file code:
Registered Nurse-Epilepsy&
To get the content of an element, you need to overwrite the characters
method... add this to your handler class:
def characters(self, data):
print data
Be careful with this, though: The parser is not required to give you all data in a single chunk. You should use an internal Buffer and read it when needed. In most of my xml/sax code I do something like this:
class MyHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self._charBuffer = []
def _flushCharBuffer(self):
s = ''.join(self._charBuffer)
self._charBuffer = []
return s
def characters(self, data):
self._charBuffer.append(data)
... and then call the flush method on the end of elements where I need the data.
For your whole use case - assuming you have a file containing multiple job descriptions and want a list which holds the jobs with each job being a dictionary of the fields, do something like this:
class MyHandler(xml.sax.handler.ContentHandler):
def __init__(self):
self._charBuffer = []
self._result = []
def _getCharacterData(self):
data = ''.join(self._charBuffer).strip()
self._charBuffer = []
return data.strip() #remove strip() if whitespace is important
def parse(self, f):
xml.sax.parse(f, self)
return self._result
def characters(self, data):
self._charBuffer.append(data)
def startElement(self, name, attrs):
if name == 'job': self._result.append({})
def endElement(self, name):
if not name == 'job': self._result[-1][name] = self._getCharacterData()
jobs = MyHandler().parse("job-file.xml") #a list of all jobs
If you just need to parse a single job at a time, you can simplify the list part and throw away the startElement
method - just set _result to a dict and assign to it directly in endElement
.