How to split a string while preserving line endings?

后端 未结 6 1220
谎友^
谎友^ 2021-01-01 06:02

I have a block of text and I want to get its lines without losing the \\r and \\n at the end. Right now, I have the following (suboptimal code):

<         


        
相关标签:
6条回答
  • 2021-01-01 06:37

    You can achieve this with a regular expression. Here's an extension method with it:

        public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
        {
            MatchCollection matches = Regex.Matches(input, @"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
            string[] result = new string[matches.Count];
            for (int i = 0; i < matches.Count ; i++)
            {
                result[i] = matches[i].Value;
            }
            return result;
        }
    

    I'm not sure if this is a better solution. Yours is very compact and simple.

    0 讨论(0)
  • 2021-01-01 06:39

    As always, extension method goodies :)

    public static class StringExtensions
    {
        public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
        {
            string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);
    
            for (int i = 0; i < obj.Length; i++)
            {
                string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
                yield return result;
            }
        }
    }
    

    usage:

            string text = "One,Two,Three,Four";
            foreach (var s in text.SplitAndKeep(","))
            {
                Console.WriteLine(s);
            }
    

    Output:

    One,

    Two,

    Three,

    Four

    0 讨论(0)
  • 2021-01-01 06:42

    If you are just going to replace the newline (\n) then do something like this:

    string[] lines = tbIn.Text.Split('\n')
                         .Select(t => t + "\r\n").ToArray();
    

    Edit: Regex.Replace allows you to split on a string.

    string[] lines = Regex.Split(tbIn.Text, "\r\n")
                 .Select(t => t + "\r\n").ToArray();
    
    0 讨论(0)
  • 2021-01-01 06:44

    Something along the lines of using this regular expression: [^\n\r]*\r\n

    Then use Regex.Matches(). The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)

    0 讨论(0)
  • 2021-01-01 06:48

    Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:

    string[] lines = tbIn.Text.Split('\n');
    for (int i = 0; i < lines.Length; ++i)
    {
        lines[i] = lines[i].Replace("\r", "\r\n");
    }
    

    ... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.

    One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?

    0 讨论(0)
  • 2021-01-01 06:53

    The following seems to do the job:

    string[] lines =  Regex.Split(tbIn.Text, @"(?<=\r\n)(?!$)");
    

    (?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.

    (?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.

    0 讨论(0)
提交回复
热议问题