I have this function to parse bbcode -> html:
$this->text = preg_replace(array(
\'/\\[b\\](.*?)\\[\\/b\\]/ms\',
\'/\\[i\\](.*?)\\[\\/i\\]/ms\',
Don't.
Instead, store both the original unparsed text and the processed parsed text. Yes, this doubles the storage requirement, but it also makes it blindingly easy to:
If you know exactly that the HTML code you want to de-bbcode was en-bbcoded using your method, than do the following:
Switch the two array you pass to preg_replace
.
In the array with the HTML code, do the following for every element: Prepend #
to the string. Append #s
. Replace \1
(and \2
aso) with (.*?)
.
For the array with the bbcodes do thefollowing with every element: Remove /
at the beginning and /ms
at end. Replace \s
with . Remove all
\
. Remove all ?
. Replace the first (.*)
in the string with $1
and the second with $2
.
This should do. If any problems: Ask ;)
It's pretty safe to say it's nigh impossible to build a reliable way to convert html to bbcode with just a slew of regexes. Use a parser (DOMDocument for instance), remove invalid elements & attributes with xpath's & inspection and then recursively walk it creating a bbcode string on the way (or just ignore invalid tags / attributes on the way).