MemoryError with minidom in Python

蓝咒 提交于 2019-12-22 13:59:57

问题


I've got a MemoryError with minidom parser in Python. I'm reading 8000 small files (most under 50 Kb) and I've got this error after 2500 reading...`

Traceback (most recent call last): 

 File "C:\eclipse\plugins\org.python.pydev.debug_2.4.0.2012020116\pysrc\pydevd.py", line 1307, in <module>
    debugger.run(setup['file'], None, None)
  File "C:\eclipse\plugins\org.python.pydev.debug_2.4.0.2012020116\pysrc\pydevd.py", line 1060, in run
    pydev_imports.execfile(file, globals, locals) #execute the script
  File "C:\Users\calculator_2012.py", line 81, in <module>
    file_content, economicFlow, elementaryFlow = XML_reader(spoldFile)
  File "C:\Users\XML_reader.py", line 10, in XML_reader
    xmltree = parse(spold_filename)
  File "C:\Python27\lib\xml\dom\minidom.py", line 1914, in parse
    return expatbuilder.parse(file)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
  File "C:\Python27\lib\xml\dom\expatbuilder.py", line 294, in character_data_handler_cdata
    _append_child(self.curNode, node)
  File "C:\Python27\lib\xml\dom\minidom.py", line 274, in _append_child
    def _append_child(self, node):
  File "C:\eclipse\plugins\org.python.pydev.debug_2.4.0.2012020116\pysrc\pydevd.py", line 942, in trace_dispatch
    traceback.print_exc()
  File "C:\Python27\lib\traceback.py", line 232, in print_exc
    print_exception(etype, value, tb, limit, file)
MemoryError

Is there anyone who can suggest a "memory leak free" parser ?


回答1:


I also suggest the builtin cElementTree. Minidom has a lot of issues :/

Otherwise lxml is also quite good and has more features.




回答2:


I seems minidom can occupy a lot of memory.

I tried to parse a file of 56MB, which takes 8G memory to read it.

you do the math..




回答3:


libxml2 is faster and without memoryleak, but it's too C like to be any good and has horrible documentation

lxml is a layer over libxm2 and the etree module so that is the python way with libxml2 performance

http://lxml.de/

edit:typos




回答4:


Well, the traceback suggests you're running in the PyDev debugger, so, have you tried running it inside Eclipse without the debugger?



来源:https://stackoverflow.com/questions/11127529/memoryerror-with-minidom-in-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!