Splitting CamelCase with regex

问题

I have this code to split CamelCase by regular expression:

Regex.Replace(input, "(?<=[a-z])([A-Z])", " $1", RegexOptions.Compiled).Trim();

However, it doesn't split this correctly: ShowXYZColours

It produces Show XYZColours instead of Show XYZ Colours

How do I get the desired result?

回答1:

Unicode-aware

(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})

Breakdown:

(?=               # look-ahead: a position followed by...
  \p{Lu}\p{Ll}    #   an uppercase and a lowercase
)                 #
|                 # or
(?<=              # look-behind: a position after...
  \p{Ll}          #   an uppercase
)                 #
(?=               # look-ahead: a position followed by...
  \p{Lu}          #   a lowercase
)                 #

Use with your regex split function.

EDIT: Of course you can replace \p{Lu} with [A-Z] and \p{Ll} with [a-z] if that's what you need or your regex engine does not understand Unicode categories.

回答2:

.NET DEMO

You can use something like this :

(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])

CODE :

string strRegex = @"(?<=[a-z])([A-Z])|(?<=[A-Z])([A-Z][a-z])";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = @"ShowXYZColours";
string strReplace = @" $1$2";

return myRegex.Replace(strTargetString, strReplace);

OUTPUT :

Show XYZ Colours

Demo and Explanation

回答3:

using Tomalak's regex with .NET System.Text.RegularExpressions creates an empty entry in position 0 of the resulting array:

Regex.Split("ShowXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[4]}
    [0]: ""
    [1]: "Show"
    [2]: "XYZ"
    [3]: "Colors"

It works for caMelCase though (as opposed to PascalCase):

Regex.Split("showXYZColors", @"(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})")

{string[3]}
    [0]: "show"
    [1]: "XYZ"
    [2]: "Colors"

回答4:

You can try this :

Regex.Replace(input, "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();

Example :

Regex.Replace("TheCapitalOfTheUAEIsAbuDhabi", "((?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]))", " $1").Trim();

Output : The Capital Of The UAE Is Abu Dhabi

来源：https://stackoverflow.com/questions/21326963/splitting-camelcase-with-regex

标签

regex

camelcasing