Extracting blocks of text from ReST documents by :ref:?

蹲街弑〆低调 提交于 2019-12-06 16:40:59
Chris

I'm not sure how you could do this other than subclassing and customising the Docutils parser. If you just need the relevant section of reStructuredText and don't mind losing some of the markup then you can try and use the following. Alternatively, the processed markup (i.e. reStructuredText converted to HTML or LaTeX) for a particular section is very easy to get. See my answer to this question for an example of extracting a part of the processed XML. Let me know if this is what you want. Anyway, here goes...

You can manipulate reStructuredText very easily using Docutils. First you could publish the Docutils document tree (doctree) representation of the reStructuredText using the Docutils publish_doctree function. This doctree can be traversed easily and searched for particular document elements, i.e. sections, with particular attributes. The easiest way to search for particular section reference is to inspect the ids attribute of the doctree itself. doctree.ids is simply a dictionary containing a mapping of all references to the appropriate part of the document.

from docutils.core import publish_doctree

s = """.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help
"""

# Parse the above string to a Docutils document tree:
doctree = publish_doctree(s)

# Get element in the document with the reference id `my-interesting-section`:
ids = 'my-interesting-section'

try:
    section = document.ids[ids]
except KeyError:
    # Do some exception handling here...
    raise KeyError('No section with ids {0}'.format(ids))

# Can also make sure that the element we found was in fact a section:
import docutils.nodes
isinstance(section, docutils.nodes.section) # Should be True

# Finally, get section text
section.astext()

# This will print:
# u'About this dialog\n\ntalk about stuff which is relevant in contextual help'

Now the markup has been lost. If there is noting too fancy, it would be easy to insert some dashes under the first line of the result above to get back to your section heading. I'm not sure what you would need to do for more complicated inline markup. Hopefully the above is a good starting point for you though.

Note: When querying doctree.ids the ids attribute I pass is slightly different to the definition in the reStructuredText: the leading underscore has been removed and all other underscores have been replaced by -s. This is how Docutils normalises references. It would be really straightforward to write a function to convert reStructuredText references to Docutils' internal representation. Otherwise, I'm sure if you dig through Docuitls you can find the routine that does this.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!