How to preserve namespace information when parsing HTML with lxml?
问题 >>> from lxml.etree import HTML, tostring >>> tostring(HTML('<fb:like>')) '<html><body><like/></body></html>' Note how the tag turns from <fb:like> to simply <like> . This makes processing pages that incorporate XFBML with lxml much harder. (Same thing happens to <g:plusone></g:plusone> ) Any help is appreciated. 回答1: Try adding the namespace prefix definitions that are missing. lxml will avoid the namespaces otherwise, supposedly to make it easier for you. Most likely the sites you try to