Regex for Markdown Emphasis

若如初见. 提交于 2021-02-10 14:51:39

问题


I'm trying to match the following markdown text for emphasis:

_this should match_
__this shouldn't__
_ neither should this _
_nor this _
this _should match_as well_
__       (double underscore, shouldn't match)

The issue that I'm facing with my own efforts as well as other solutions on SO is that they still end up matching the third line:

_ neither should this _

Is there a way to check of my particular use case? I'm aiming this for browser applications, and since Firefox and Safari are yet to support lookbehinds, is there a way to do this without lookbehinds?

Here's the regex pattern that I've come up with so far: /(_)((?!\1|\s).*)?\1/

Luckily, I'm able to fulfil almost all of my checks, however my pattern still matches:

_nor this _
__       (double underscore, shouldn't match)    

So, is there a way to ensure that there is atleast one character between the underscores, and that they are not separated from the text by a space?

Link to regexr playground: regexr.com/5300j

Example:

const regex = /(_)((?!\1|\s).*)?\1/gm;
const str = `_this should match_
__this shouldn't__
_ neither should this _
_nor this _
this _should match_as well_
__
_ neither should this _`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

回答1:


You may use either of

\b_(?![_\s])(.*?[^_\s])_\b
\b_(?![_\s])(.*?[^_\s])_(?!\S)

See the regex demo

Details

  • \b - no word char (letter, digit, _) allowed immediately before the match
  • _ - an underscore
  • (?![_\s]) - no _ or whitespace chars are allowed immediately after _
  • (.*?[^_\s]) - Group 1:
    • .*? - any 0 or more chars other than line break chars, as few as possible
    • [^_\s] - any 1 char other than _ and whitespace
  • _ - an underscore
  • \b - no word char allowed immediately after the _.

Note that (?!\S) fails the match if there is no non-whitespace char immediately to the right of the current location and acts as a right-hand whitespace boundary.



来源:https://stackoverflow.com/questions/61346949/regex-for-markdown-emphasis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!