is there a elegant way to parse a word and add spaces before capital letters

后端 未结 7 721
遥遥无期
遥遥无期 2020-12-01 09:43

i need to parse some data and i want to convert

AutomaticTrackingSystem

to

Automatic Tracking System

esse

相关标签:
7条回答
  • 2020-12-01 09:54

    Try this:

    using System;
    using System.Linq;
    using System.Text.RegularExpressions;
    
    class MainClass
    {
        public static void Main (string[] args)
        {
            var rx = new Regex
                    (@"([a-z]+[A-Z]|[A-Z][A-Z]+|[A-Z]|[^A-Za-z][^A-Za-z]+)");
    
            string[] tests = {
            "AutomaticTrackingSystem",
            "XMLEditor",
            "AnXMLAndXSLT2.0Tool",
            "NumberOfABCDThings",
            "AGoodMan",
            "CodeOfAGoodMan"
            };
    
            foreach(string t in tests)
            {
                string y = Reverse(t);
                string x = Reverse( rx.Replace(y, @" $1") );
                Console.WriteLine("\n\n{0} -- {1}",y,x);    
            }
    
        }
    
        static string Reverse(string s)
        {
            var ca = s.ToCharArray();
            Array.Reverse(ca);
            string t = new string(ca);
            return t;
        }
    
    }
    

    Output:

    metsySgnikcarTcitamotuA -- Automatic Tracking System 
    
    
    rotidELMX -- XML Editor 
    
    
    looT0.2TLSXdnALMXnA -- An XML And XSLT 2.0 Tool 
    
    
    sgnihTDCBAfOrebmuN -- Number Of ABCD Things 
    
    
    naMdooGA -- A Good Man 
    
    
    naMdooGAfOedoC -- Code Of A Good Man 
    

    It works by scanning the string backward, and making the capital letter the terminator. Wishing there's a parameter for RegEx for scanning the string backwards, so the above separate string reversal won't be needed anymore :-)

    0 讨论(0)
  • I've just written a function to do exactly this. :)

    Replace ([a-z])([A-Z]) with $1 $2 (or \1 \2 in other languages).

    I've also got a replace for ([A-Z]+)([A-Z][a-z]) too - this converts things like "NumberOfABCDThings" into "Number Of ABCD Things"

    So in C# this would look something like:

    Regex r1 = new Regex(@"([a-z])([A-Z])");
    Regex r2 = new Regex(@"([A-Z]+)([A-Z][a-z])");
    
    NewString = r1.Replace( InputString , "$1 $2");
    NewString = r2.Replace( NewString , "$1 $2");
    

    (although possibly there's a more consice way of writing that)

    If you might have punctuation or numbers, I guess you could try ([^A-Z])([A-Z]) for the first match.

    Hmmm, another way of writing those regexes, using lookbehind and lookahead, is to just match the position and insert a space - i.e. (?<=[a-z])(?=[A-Z]) and (?<=[A-Z]+)(?=[A-Z][a-z]) and in both cases replace with just " " - not sure whether there may be advantages to that method, but it's an interesting way. :)

    0 讨论(0)
  • 2020-12-01 09:57

    If you seek to keep acronyms intact, replace "([^A-Z])([A-Z])" with "\1 \2", else replace "(.)([A-Z])" with "\1 \2".

    0 讨论(0)
  • 2020-12-01 09:59

    You can use lookarounds, e.g:

    string[] tests = {
       "AutomaticTrackingSystem",
       "XMLEditor",
    };
    
    Regex r = new Regex(@"(?!^)(?=[A-Z])");
    foreach (string test in tests) {
       Console.WriteLine(r.Replace(test, " "));
    }
    

    This prints (as seen on ideone.com):

    Automatic Tracking System
    X M L Editor
    

    The regex (?!^)(?=[A-Z]) consists of two assertions:

    • (?!^) - i.e. we're not at the beginning of the string
    • (?=[A-Z]) - i.e. we're just before an uppercase letter

    Related questions

    • How do I convert CamelCase into human-readable names in Java?
    • How does the regular expression (?<=#)[^#]+(?=#) work?

    References

    • regular-expressions.info/Lookarounds

    Splitting the difference

    Here's where using assertions really make a difference, when you have several different rules, and/or you want to Split instead of Replace. This example combines both:

    string[] tests = {
       "AutomaticTrackingSystem",
       "XMLEditor",
       "AnXMLAndXSLT2.0Tool",
    };
    
    Regex r = new Regex(
       @"  (?<=[A-Z])(?=[A-Z][a-z])    # UC before me, UC lc after me
        |  (?<=[^A-Z])(?=[A-Z])        # Not UC before me, UC after me
        |  (?<=[A-Za-z])(?=[^A-Za-z])  # Letter before me, non letter after me
        ",
       RegexOptions.IgnorePatternWhitespace
    );
    foreach (string test in tests) {
       foreach (string part in r.Split(test)) {
          Console.Write("[" + part + "]");
       }
       Console.WriteLine();
    }
    

    This prints (as seen on ideone.com):

    [Automatic][Tracking][System]
    [XML][Editor]
    [An][XML][And][XSLT][2.0][Tool]
    

    Related questions

    • Java split is eating my characters.
      • Has many examples of splitting on zero-width matching assertions
    0 讨论(0)
  • 2020-12-01 10:02

    Just use this linq one-liner: (perfectly works for me)

    public static string SpaceCamelCase(string input)
    {
        return input.Aggregate(string.Empty, (old, x) => $"{old}{(char.IsUpper(x) ? " " : "")}{x}").TrimStart(' ');
    }
    
    0 讨论(0)
  • 2020-12-01 10:06

    Apparently, there's an option for reverse regex :-) We can now eliminate string reversal, here's another way to do it:

    using System;
    using System.Linq;
    using System.Text.RegularExpressions;
    
    class MainClass
    {
        public static void Main (string[] args)
        {
            Regex ry = new Regex
                  (@"([A-Z][a-z]+|[A-Z]+[A-Z]|[A-Z]|[^A-Za-z]+[^A-Za-z])", 
                  RegexOptions.RightToLeft);
    
    
            string[] tests = {
            "AutomaticTrackingSystem",
            "XMLEditor",
            "AnXMLAndXSLT2.0Tool",
            "NumberOfABCDThings",
            "AGoodMan",
            "CodeOfAGoodMan"
            };
    
    
            foreach(string t in tests)
            {
                Console.WriteLine("\n\n{0} -- {1}", t, ry.Replace(t, " $1"));   
            }
    
        }
    
    
    }
    

    Output:

    AutomaticTrackingSystem --  Automatic Tracking System
    
    
    XMLEditor --  XML Editor
    
    
    AnXMLAndXSLT2.0Tool --  An XML And XSLT 2.0 Tool
    
    
    NumberOfABCDThings --  Number Of ABCD Things
    
    
    AGoodMan --  A Good Man
    
    
    CodeOfAGoodMan --  Code Of A Good Man
    
    0 讨论(0)
提交回复
热议问题