Regular expression, split string by capital letter but ignore TLA

后端 未结 7 904
小蘑菇
小蘑菇 2020-11-27 13:26

I\'m using the regex

System.Text.RegularExpressions.Regex.Replace(stringToSplit, \"([A-Z])\", \" $1\").Trim()

to split strings by capital l

相关标签:
7条回答
  • 2020-11-27 13:29

    My version that also handles simple arithmetic expressions:

    private string InjectSpaces(string s)
    {
        var patterns = new string[] {
            @"(?<=[^A-Z,&])[A-Z]",          // match capital preceded by any non-capital except ampersand
            @"(?<=[A-Z])[A-Z](?=[a-z])",    // match capital preceded by capital and followed by lowercase letter
            @"[\+\-\*\/\=]",                // match arithmetic operators
            @"(?<=[\+\-\*\/\=])[0-9,\(]"    // match 0-9 or open paren preceded by arithmetic operator
        };
        var pattern = $"({string.Join("|", patterns)})";
        return Regex.Replace(s, pattern, " $1");
    }
    
    0 讨论(0)
  • 2020-11-27 13:31

    Note: I didn't read the question good enough, USAToday will return "Today"; so this anwser isn't the right one.

        public static List<string> SplitOnCamelCase(string text)
        {
            List<string> list = new List<string> ();
            Regex regex = new Regex(@"(\p{Lu}\p{Ll}+)");
            foreach (Match match in regex.Matches(text))
            {
                list.Add (match.Value);
            }
            return list;
        }
    

    This will match "WakeOnBoot" as "Wake On Boot" and doesn't return anything on NMI or TLA

    0 讨论(0)
  • 2020-11-27 13:38

    any uppercase character that is not followed by an uppercase character:

    Replace(string, "([A-Z])(?![A-Z])", " $1")
    

    Edit:

    I just noticed that you're using this for enumerations. I really do not encourage using string representations of enumerations like this, and the problems at hand is a good reason why. Have a look at this instead: http://www.refactoring.com/catalog/replaceTypeCodeWithClass.html

    0 讨论(0)
  • 2020-11-27 13:39
    ((?<=[a-z])[A-Z]|[A-Z](?=[a-z]))
    

    or its Unicode-aware cousin

    ((?<=\p{Ll})\p{Lu}|\p{Lu}(?=\p{Ll}))
    

    when replaced globally with

    " $1"
    

    handles

    TodayILiveInTheUSAWithSimon
    USAToday
    IAmSOOOBored
    

    yielding

     Today I Live In The USA With Simon
    USA Today
    I Am SOOO Bored
    

    In a second step you'd have to trim the string.

    0 讨论(0)
  • 2020-11-27 13:40

    You might think about changing the enumerations; MS coding guidelines suggest Pascal casing acronyms as though they were words; XmlDocument, HtmlWriter, etc. Two-letter acryonyms don't follow this rule, though; System.IO.

    So you should be using UsaToday, and your problem will disappear.

    0 讨论(0)
  • 2020-11-27 13:48

    I hope this will help you regarding splitting a string by its capital letters and much more. You can try using Humanizer, which is a free nuget package. This will save you for more trouble with letters, sentences, numbers, quantities and much more in many languages. Check out this at: https://www.nuget.org/packages/Humanizer/

    0 讨论(0)
提交回复
热议问题