How do I generate a table of contents for HTML text in Python?

僤鯓⒐⒋嵵緔 提交于 2019-12-23 03:50:06

问题


Assume that I have some HTML code, like this (generated from Markdown or Textile or something):

<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->

How could I generate a table of contents for it using Python?


回答1:


Use an HTML parser such as lxml or BeautifulSoup to find all header elements.




回答2:


Here's an example using lxml and xpath.

from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
    print node.tag, node.text


来源:https://stackoverflow.com/questions/2210265/how-do-i-generate-a-table-of-contents-for-html-text-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!