is there a elegant way to parse a word and add spaces before capital letters

后端未结

关注

 7  721

i need to parse some data and i want to convert

AutomaticTrackingSystem

Automatic Tracking System

esse

相关标签:

7条回答

刺人心

2020-12-01 09:54

Try this:

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass
{
    public static void Main (string[] args)
    {
        var rx = new Regex
                (@"([a-z]+[A-Z]|[A-Z][A-Z]+|[A-Z]|[^A-Za-z][^A-Za-z]+)");

        string[] tests = {
        "AutomaticTrackingSystem",
        "XMLEditor",
        "AnXMLAndXSLT2.0Tool",
        "NumberOfABCDThings",
        "AGoodMan",
        "CodeOfAGoodMan"
        };

        foreach(string t in tests)
        {
            string y = Reverse(t);
            string x = Reverse( rx.Replace(y, @" $1") );
            Console.WriteLine("\n\n{0} -- {1}",y,x);    
        }

    }

    static string Reverse(string s)
    {
        var ca = s.ToCharArray();
        Array.Reverse(ca);
        string t = new string(ca);
        return t;
    }

}

Output:

metsySgnikcarTcitamotuA -- Automatic Tracking System 


rotidELMX -- XML Editor 


looT0.2TLSXdnALMXnA -- An XML And XSLT 2.0 Tool 


sgnihTDCBAfOrebmuN -- Number Of ABCD Things 


naMdooGA -- A Good Man 


naMdooGAfOedoC -- Code Of A Good Man

It works by scanning the string backward, and making the capital letter the terminator. Wishing there's a parameter for RegEx for scanning the string backwards, so the above separate string reversal won't be needed anymore :-)

0 讨论(0)

不要未来只要你来

2020-12-01 09:56
I've just written a function to do exactly this. :)

Replace ([a-z])([A-Z]) with $1 $2 (or \1 \2 in other languages).

I've also got a replace for ([A-Z]+)([A-Z][a-z]) too - this converts things like "NumberOfABCDThings" into "Number Of ABCD Things"

So in C# this would look something like:
```
Regex r1 = new Regex(@"([a-z])([A-Z])");
Regex r2 = new Regex(@"([A-Z]+)([A-Z][a-z])");

NewString = r1.Replace( InputString , "$1 $2");
NewString = r2.Replace( NewString , "$1 $2");
```
(although possibly there's a more consice way of writing that)

If you might have punctuation or numbers, I guess you could try ([^A-Z])([A-Z]) for the first match.

Hmmm, another way of writing those regexes, using lookbehind and lookahead, is to just match the position and insert a space - i.e. (?<=[a-z])(?=[A-Z]) and (?<=[A-Z]+)(?=[A-Z][a-z]) and in both cases replace with just " " - not sure whether there may be advantages to that method, but it's an interesting way. :)
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-12-01 09:57

If you seek to keep acronyms intact, replace "([^A-Z])([A-Z])" with "\1 \2", else replace "(.)([A-Z])" with "\1 \2".

0 讨论(0)
发布评论:

提交评论
- 加载中...

天涯浪人

2020-12-01 09:59

You can use lookarounds, e.g:

string[] tests = {
   "AutomaticTrackingSystem",
   "XMLEditor",
};

Regex r = new Regex(@"(?!^)(?=[A-Z])");
foreach (string test in tests) {
   Console.WriteLine(r.Replace(test, " "));
}

This prints (as seen on ideone.com):

Automatic Tracking System
X M L Editor

The regex (?!^)(?=[A-Z]) consists of two assertions:

(?!^) - i.e. we're not at the beginning of the string
(?=[A-Z]) - i.e. we're just before an uppercase letter

References

regular-expressions.info/Lookarounds

Splitting the difference

Here's where using assertions really make a difference, when you have several different rules, and/or you want to Split instead of Replace. This example combines both:

string[] tests = {
   "AutomaticTrackingSystem",
   "XMLEditor",
   "AnXMLAndXSLT2.0Tool",
};

Regex r = new Regex(
   @"  (?<=[A-Z])(?=[A-Z][a-z])    # UC before me, UC lc after me
    |  (?<=[^A-Z])(?=[A-Z])        # Not UC before me, UC after me
    |  (?<=[A-Za-z])(?=[^A-Za-z])  # Letter before me, non letter after me
    ",
   RegexOptions.IgnorePatternWhitespace
);
foreach (string test in tests) {
   foreach (string part in r.Split(test)) {
      Console.Write("[" + part + "]");
   }
   Console.WriteLine();
}

This prints (as seen on ideone.com):

[Automatic][Tracking][System]
[XML][Editor]
[An][XML][And][XSLT][2.0][Tool]

Related questions

Java split is eating my characters.
- Has many examples of splitting on zero-width matching assertions

0 讨论(0)

南旧

2020-12-01 10:02

Just use this linq one-liner: (perfectly works for me)

public static string SpaceCamelCase(string input)
{
    return input.Aggregate(string.Empty, (old, x) => $"{old}{(char.IsUpper(x) ? " " : "")}{x}").TrimStart(' ');
}

0 讨论(0)

礼貌的吻别

2020-12-01 10:06

Apparently, there's an option for reverse regex :-) We can now eliminate string reversal, here's another way to do it:

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass
{
    public static void Main (string[] args)
    {
        Regex ry = new Regex
              (@"([A-Z][a-z]+|[A-Z]+[A-Z]|[A-Z]|[^A-Za-z]+[^A-Za-z])", 
              RegexOptions.RightToLeft);


        string[] tests = {
        "AutomaticTrackingSystem",
        "XMLEditor",
        "AnXMLAndXSLT2.0Tool",
        "NumberOfABCDThings",
        "AGoodMan",
        "CodeOfAGoodMan"
        };


        foreach(string t in tests)
        {
            Console.WriteLine("\n\n{0} -- {1}", t, ry.Replace(t, " $1"));   
        }

    }


}

Output:

AutomaticTrackingSystem --  Automatic Tracking System


XMLEditor --  XML Editor


AnXMLAndXSLT2.0Tool --  An XML And XSLT 2.0 Tool


NumberOfABCDThings --  Number Of ABCD Things


AGoodMan --  A Good Man


CodeOfAGoodMan --  Code Of A Good Man

0 讨论(0)

1 2 下一页