Can I use LINQ to strip repeating spaces from a string?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-23 15:29:31

问题


A quick brain teaser: given a string

This  is a string with  repeating   spaces

What would be the LINQ expressing to end up with

This is a string with repeating spaces

Thanks!

For reference, here's one non-LINQ way:

private static IEnumerable<char> RemoveRepeatingSpaces(IEnumerable<char> text)
{
  bool isSpace = false;
  foreach (var c in text)
  {
    if (isSpace && char.IsWhiteSpace(c)) continue;

    isSpace = char.IsWhiteSpace(c);
    yield return c;
  }
}

回答1:


Since nobody seems to have given a satisfactory answer, I came up with one. Here's a string-based solution (.Net 4):

public static string RemoveRepeatedSpaces(this string s)
{
    return s[0] + string.Join("",
           s.Zip(
               s.Skip(1),
               (x, y) => x == y && y == ' ' ? (char?)null : y));
}

However, this is just a general case of removing repeated elements from a sequence, so here's the generalized version:

public static IEnumerable<T> RemoveRepeatedElements<T>(
                             this IEnumerable<T> s, T dup)
{
    return s.Take(1).Concat(
            s.Zip(
                s.Skip(1),
                (x, y) => x.Equals(y) && y.Equals(dup) ? (object)null : y)
            .OfType<T>());
}

Of course, that's really just a more specific version of a function that removes all consecutive duplicates from its input stream:

public static IEnumerable<T> RemoveRepeatedElements<T>(this IEnumerable<T> s)
{
    return s.Take(1).Concat(
            s.Zip(
                s.Skip(1),
                (x, y) => x.Equals(y) ? (object)null : y)
            .OfType<T>());
}

And obviously you can implement the first function in terms of the second:

public static string RemoveRepeatedSpaces(this string s)
{
    return string.Join("", s.RemoveRepeatedElements(' '));
}

BTW, I benchmarked my last function against the regex version (Regex.Replace(s, " +", " ")) and they were were within nanoseconds of each other, so the extra LINQ overhead is negligible compared to the extra regex overhead. When I generalized it to remove all consecutive duplicate characters, the equivalent regex (Regex.Replace(s, "(.)\\1+", "$1")) was 3.5 times slower than my LINQ version (string.Join("", s.RemoveRepeatedElements())).

I also tried the "ideal" procedural solution:

public static string RemoveRepeatedSpaces(string s)
{
    StringBuilder sb = new StringBuilder(s.Length);
    char lastChar = '\0';
    foreach (char c in s)
        if (c != ' ' || lastChar != ' ')
            sb.Append(lastChar = c);
    return sb.ToString();
}

This is more than 5 times faster than a regex!




回答2:


This is not a linq type task, use regex

string output = Regex.Replace(input," +"," ");

Of course you could use linq to apply this to a collection of strings.




回答3:


public static string TrimInternal(this string text)
{
  var trimmed = text.Where((c, index) => !char.IsWhiteSpace(c) || (index != 0 && !char.IsWhiteSpace(text[index - 1])));
  return new string(trimmed.ToArray());
}



回答4:


In practice, I would probably just use your original solution or regular expressions (if you want a quick & simple solution). A geeky approach that uses lambda functions would be to define a fixed point operator:

T FixPoint<T>(T initial, Func<T, T> f) {
   T current = initial;
   do { 
     initial = current;
     current = f(initial);
   } while (initial != current);
   return current;
}

This keeps calling the operation f repeatedly until the operation returns the same value that it got as an argument. You can think of the operation as a generalized loop - it is quite useful, though I guess it is too geeky to be included in .NET BCL. Then you can write:

string res = FixPoint(original, s => s.Replace("  ", " "));

It is not as efficient as your original version, but unless there are too many spaces it should work fine.




回答5:


Linq is by definition related to enumerable (i.e. collections, list, arrays). You could transorm your string into a collection of char and select the non space one but this is definitevly not a job for Linq.




回答6:


Paul Creasey's answer is the way to go.

If you want to treat tabs as whitespace as well, go with:

text = Regex.Replace(text, "[ |\t]+", " ");

UPDATE:

The most logical way to solve this problem while satisfying the "using LINQ" requirement has been suggested by both Hasan and Ani. However, notice that these solutions involve accessing a character in a string by index.

The spirit of the LINQ approach is that it can be applied to any enumerable sequence. Because any reasonably efficient solution to this problem requires maintaining some kind of state (with Ani's and Hasan's solutions it's easy to miss this fact as the state is already maintained within the string itself), a generic approach that accepts any sequence of items is likely going to be much more straightforward to implement using procedural code.

This procedural code may then be abstracted into a method that looks like a LINQ-style method, of course. But I would not recommend tackling a problem like this with the attitude of "I want to use LINQ in this solution" from the get-go because it will impose very awkward restriction on your code.

For what it's worth, here's how I'd implement the general idea.

public static IEnumerable<T> StripConsecutives<T>(this IEnumerable<T> source, T value, IEqualityComparer<T> comparer)
{
    // null-checking omitted for brevity

    using (var enumerator = source.GetEnumerator())
    {
        if (enumerator.MoveNext())
        {
            yield return enumerator.Current;
        }
        else
        {
            yield break;
        }

        T prev = enumerator.Current;
        while (enumerator.MoveNext())
        {
            T current = enumerator.Current;
            if (comparer.Equals(prev, value) && comparer.Equals(current, value))
            {
                // This is a consecutive occurrence of value --
                // moving on...
            }
            else
            {
                yield return current;
            }
            prev = current;
        }
    }
}


来源:https://stackoverflow.com/questions/3595583/can-i-use-linq-to-strip-repeating-spaces-from-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!