I\'m using the regex
System.Text.RegularExpressions.Regex.Replace(stringToSplit, \"([A-Z])\", \" $1\").Trim()
to split strings by capital l
My version that also handles simple arithmetic expressions:
private string InjectSpaces(string s)
{
var patterns = new string[] {
@"(?<=[^A-Z,&])[A-Z]", // match capital preceded by any non-capital except ampersand
@"(?<=[A-Z])[A-Z](?=[a-z])", // match capital preceded by capital and followed by lowercase letter
@"[\+\-\*\/\=]", // match arithmetic operators
@"(?<=[\+\-\*\/\=])[0-9,\(]" // match 0-9 or open paren preceded by arithmetic operator
};
var pattern = $"({string.Join("|", patterns)})";
return Regex.Replace(s, pattern, " $1");
}
Note: I didn't read the question good enough, USAToday will return "Today"; so this anwser isn't the right one.
public static List<string> SplitOnCamelCase(string text)
{
List<string> list = new List<string> ();
Regex regex = new Regex(@"(\p{Lu}\p{Ll}+)");
foreach (Match match in regex.Matches(text))
{
list.Add (match.Value);
}
return list;
}
This will match "WakeOnBoot" as "Wake On Boot" and doesn't return anything on NMI or TLA
any uppercase character that is not followed by an uppercase character:
Replace(string, "([A-Z])(?![A-Z])", " $1")
Edit:
I just noticed that you're using this for enumerations. I really do not encourage using string representations of enumerations like this, and the problems at hand is a good reason why. Have a look at this instead: http://www.refactoring.com/catalog/replaceTypeCodeWithClass.html
((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))
or its Unicode-aware cousin
((?<=\p{Ll})\p{Lu}|\p{Lu}(?=\p{Ll}))
when replaced globally with
" $1"
handles
TodayILiveInTheUSAWithSimon USAToday IAmSOOOBored
yielding
Today I Live In The USA With Simon USA Today I Am SOOO Bored
In a second step you'd have to trim the string.
You might think about changing the enumerations; MS coding guidelines suggest Pascal casing acronyms as though they were words; XmlDocument
, HtmlWriter
, etc. Two-letter acryonyms don't follow this rule, though; System.IO
.
So you should be using UsaToday
, and your problem will disappear.
I hope this will help you regarding splitting a string by its capital letters and much more. You can try using Humanizer, which is a free nuget package. This will save you for more trouble with letters, sentences, numbers, quantities and much more in many languages. Check out this at: https://www.nuget.org/packages/Humanizer/