wikimedia-dumps

Empty list returned from ElementTree findall

阅读更多关于 Empty list returned from ElementTree findall

I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things seemed to be tailored towards parsing HTML. This is a section of my XML: <mediawiki xmlns="http://www

Empty list returned from ElementTree findall

阅读更多关于 Empty list returned from ElementTree findall

问题 I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things

订阅 wikimedia-dumps