Here was a question about the CamelCase regex. With the combination of tchrist post i\'m wondering what is the correct utf-8 CamelCase.
Starting wit
I really can’t tell what you’re trying to do, but this should be closer to what your original intent seems to have been. I still can’t tell what you mean to do with it, though.
m{
\b
\p{Upper} # start with uppercase code point (NOT LETTER)
\w* # optional ident chars
# note that upper and lower are not related to letters
(?: \p{Lower} \w* \p{Upper}
| \p{Upper} \w* \p{Lower}
)
\w*
\b
}x
Never use [a-z]
. And in fact, don’t use \p{Lowercase_Letter}
or \p{Ll}
, since those are not the same as the more desirable and more correct \p{Lowercase}
and \p{Lower}
.
And remember that \w
is really just an alias for
[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Letter_Number}\p{Connector_Punctuation}]