wikimedia-dumps

Empty list returned from ElementTree findall

百般思念 提交于 2019-11-27 08:05:47
I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things seemed to be tailored towards parsing HTML. This is a section of my XML: <mediawiki xmlns="http://www

Empty list returned from ElementTree findall

荒凉一梦 提交于 2019-11-26 13:59:28
问题 I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text. For now I've got this: from xml.etree import ElementTree as etree def parser(file_name): document = etree.parse(file_name) titles = document.findall('.//title') print titles At the moment titles isn't returning anything. I've looked at previous answers like this one: ElementTree findall() returning empty list and the lxml documentation, but most things