Markdown: how to show a preview (such as the first N words)

会有一股神秘感。 提交于 2019-12-02 08:39:36
Let's use this **sample document** with _various_ types of [Markdown](http://daringfireball.net/projects/markdown/) markup.

Now, let's assume you take the first 20 chars. You would get:

Let's use this **sam

and 100 chars gives you:

Let's use this **sample document** with _various_ types of [Markdown](http://dar

While those char lengths are arbitrary and probably not lengths you would use, the point is that each of them break the Markdown syntax. A better approach would be parse the document to HTML, then break out the beginning of the HTML document.

Of course, you would probably want to use an HTML document model to some sort rather than splitting on raw char length for the same reasons. Why not simply take the first paragraph? If the paragraph is to long, break on the Nth char, but only counting the chars in the body text, not the chars which make up the HTML markup. How to do that would depend on which tool/library you are using to handle the HTML and this is not the place to make tool recommendations (and I'm not very familiar with Ruby/Rails - more of a Python guy).

Note that the second example I give above breaks the Markdown in the middle of a URL for a link. If you first convert the Markdown to HTML and break only counting text chars, then the URL will remain in tact even if the link text (label) gets truncated. Although, in that case, it might be better to truncate the text after the end of the link. That depends on how complicated you want to make your code.

A natural next step is to ask why not do all that with the Markdown text instead of converting the entire document to HTML first? You could, but then you would be re-implementing your own Markdown parser... unless you happen to use a Markdown parser which gives you access to the the internals (through some plug-in API) or outputs a parse three. If you are using a parser which returns a parse tree, you could truncate the parse tree, then pass it on to the renderer. Short of that, using parsed HTML is probably the best option.

Either way, lets work through an example. The HTML for the above example would look something like this:

<p>Let's use this <strong>sample document</strong> with <emphasis>various</emphasis> types of <a href="http://daringfireball.net/projects/markdown/">Markdown</a> documents</p>

Now, let's represent that document as some sort of pseudo document object (using JSON):

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sample document"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " with "
            },
            {
                'type': 'element',
                'tag': 'emphasis',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "various"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': " types of "
            },
            {
                'type': 'element',
                'tag': 'a',
                'href': 'http://daringfireball.net/projects/markdown/'
                'children': 
                    [
                        {
                            'type': text,
                            'text': "Markdown"
                        }
                    ]
            },
            {
                'type': 'text',
                'text': "  markup."
            }
        ]
}]

Now, just loop through that document (and its children), only counting chars for the "text" field of "text" types until you reach your maximum. Then truncate any additional elements after that in the document. When the document is rendered (using a proper HTML renderer), all the HTML elements will be properly closed. Obviously, the exact process would depend on what sort of document object the document is contained in (which may depend on the HTML parser and/or Markdown parser you are using).

In any event, the document truncated to 20 chars would result in this:

[{
    'type': 'element',
    'tag': 'p',
    'children' :
        [
            {
                'type': 'text',
                'text': "Let's use this "
            },
            {
                'type': 'element',
                'tag': 'strong',
                'children': 
                    [
                        {
                            'type': text,
                            'text': "sampl"
                        }
                    ]
            },
        ]
}]

Which would render as:

<p>Let's use this <strong>sampl</strong></p>

Note that the text only (Let's us this sampl) counts as 20 chars.

While the above examples use chars, you could certainly use the same principles and count words instead.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!