问题
This question is supplementary to: Recursive processing of markup using Regular Expression and DOMDocument
The code supplied by the selected answer has been a great help to understand building a basic syntax tree. However I am now having troubles tightening the regular expressions to only match my syntax rather than {.
but not {{
. Ideally I would like it to only match my syntax which is:
{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}
Two tags, a
and small
also require differing end tags. I have tried modifying $re_closetag
from the original code sample to reflect this but it still matches too much as text.
For example:
http://www.google.com/>} bang
smäll<} boom
My test string is:
tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3
回答1:
You can either control this in the RE itself or after a match.
In the re, to control what tags may be "open" modify this part of $re_next
:
(?:\{(?P<opentag>[^{\s])) # match an open tag
#which is "{" followed by anything other than whitespace or another "{"
Currently it looks for any character which is not {
or whitespace. Simply change to this:
(?:\{(?P<opentag>[<!*/|>-]))
Now it looks for only your specific open tags.
The close tag portion only matches a single character at a time depending on what tag is open in the current context. (This is what the $opentag
argument is for.) So to match a pair of characters, simply change the $opentag
to look for in the recursive call. E.g.:
if (isset($m['opentag']) && $m['opentag'][1] !== -1) {
list($newopen, $_) = $m['opentag'];
// change the close character to look for in the new context
if ($newopen==='>') $newopen = '<';
else if ($newopen==='<') $newopen = '>';
list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen);
$ast[] = array($newopen, $subast);
} else if (isset($m['text']) && $m['text'][1] !== -1) {
Alternatively, you can keep the RE as-is and decide what to do with the match after the fact. For example, if you match a @
character but {@
is not an allowed open tag, you can either raise a parse error or simply treat it as a text node (attaching array('#text', '{@')
to the ast), or anything in between.
来源:https://stackoverflow.com/questions/15934295/parsing-markup-into-abstract-syntax-tree-using-regular-expression