Utf8 correct regex for CamelCase (WikiWord) in perl

前端 未结 1 828
清酒与你
清酒与你 2021-01-19 17:39

Here was a question about the CamelCase regex. With the combination of tchrist post i\'m wondering what is the correct utf-8 CamelCase.

Starting wit

1条回答
  •  抹茶落季
    2021-01-19 18:10

    I really can’t tell what you’re trying to do, but this should be closer to what your original intent seems to have been. I still can’t tell what you mean to do with it, though.

    m{
        \b
        \p{Upper}      #  start with uppercase code point (NOT LETTER)
    
        \w*            #  optional ident chars 
    
        # note that upper and lower are not related to letters
        (?:  \p{Lower} \w* \p{Upper}
          |  \p{Upper} \w* \p{Lower}
        )
    
        \w*
    
        \b
    }x
    

    Never use [a-z]. And in fact, don’t use \p{Lowercase_Letter} or \p{Ll}, since those are not the same as the more desirable and more correct \p{Lowercase} and \p{Lower}.

    And remember that \w is really just an alias for

    [\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Letter_Number}\p{Connector_Punctuation}]
    

    0 讨论(0)
提交回复
热议问题