Matching end of line position using m flag with different line ending styles

霸气de小男生 提交于 2020-01-11 10:17:27

问题


I'm trying to wrap each line that starts with "## " with tags. Trying to achieve a GitHub/Stackoverflow-like syntax for text formatting.

This is what I got:

$value = preg_replace('/^## (.*)$/m', '<p>$1</p>', $value);

After googling for quite a while this seems the right solution, however it doesn't work as expected or I just don't understand something.

Example text:

## Some header 1

Some text that doesn't need to be altered

## Some header 2

And this is the result:

<p>Some header 1
</p>

Some text that doesn't need to be altered

<p>Some header 2</p>

As you can see, the second header gets processed fine as it's at the end of the text. The first header, however, gets an extra new line at the end before the closing tag. How do I get rid of that?


回答1:


It seems that in your current PCRE settings, a dot matches all chars other than LF (\n, line feed), and thus, it matches CR (\r, carriage return), and that is also a line break char.

PCRE supports overriding of the default newline (and therefore the behavior of the $ anchor). To make the . match all characters but CR and LF, turn on the corresponding flag:

'/(*ANYCRLF)^## (.*)$/m'
  ^^^^^^^^^^

$ will assert the end of line before \r\n.

See more about this and other verbs at rexegg.com:

By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a . (as the dot it doesn't match line breaks unless in dotall mode), as well the ^ and $ anchors' behavior in multiline mode. You can override this default with the following modifiers:

(*CR) Only a carriage return is considered to be a line break
(*LF) Only a line feed is considered to be a line break (as on Unix)
(*CRLF) Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
(*ANYCRLF) Any of the above three is considered to be a line break
(*ANY) Any Unicode newline sequence is considered to be a line break

For instance, (*CR)\w+.\w+ matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See demo.



来源:https://stackoverflow.com/questions/45615555/matching-end-of-line-position-using-m-flag-with-different-line-ending-styles

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!