In C# what\'s the best way to remove blank lines i.e., lines that contain only whitespace from a string? I\'m happy to use a Regex if that\'s the best solution.
EDIT
Im not sure is it efficient but =)
List<string> strList = myString.Split(new string[] { "\n" }, StringSplitOptions.None).ToList<string>();
myString = string.Join("\n", strList.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList());
Using LINQ:
var result = string.Join("\r\n",
multilineString.Split(new string[] { "\r\n" }, ...None)
.Where(s => !string.IsNullOrWhitespace(s)));
If you're dealing with large inputs and/or inconsistent line endings you should use a StringReader and do the above old-school with a foreach loop instead.
Alright this answer is in accordance to the clarified requirements specified in the bounty:
I also need to remove any trailing newlines, and my Regex-fu is failing. My bounty goes to anyone who can give me a regex which passes this test: StripWhitespace("test\r\n \r\nthis\r\n\r\n") == "test\r\nthis"
So Here's the answer:
(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z
Or in the C# code provided by @Chris Schmich:
string fix = Regex.Replace("test\r\n \r\nthis\r\n\r\n", @"(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|(\r?\n)+\z", string.Empty, RegexOptions.Multiline);
Now let's try to understand it. There are three optional patterns in here which I am willing to replace with string.empty
.
(?<=\r?\n)(\s*$\r?\n)+
- matches one to unlimited lines containing only white space and preceeded by a line break (but does not match the first preceeding line breaks).(?<=\r?\n)(\r?\n)+
- matches one to unlimited empty lines with no content that are preceeded by a line break (but does not match the first preceeding line breaks).(\r?\n)+\z
- matches one to unlimited line breaks at the end of the tested string (trailing line breaks as you called them)That satisfies your test perfectly! But also satisfies both \r\n
and \n
line break styles! Test it out! I believe this will be the most correct answer, although simpler expression would pass your specified bounty test, this regex passes more complex conditions.
EDIT: @Will pointed out a potential flaw in the last pattern match of the above regex in that it won't match multiple line breaks containing white space at the end of the test string. So let's change that last pattern to this:
\b\s+\z
The \b is a word boundry (beginning or END of a word), the \s+ is one or more white space characters, the \z is the end of the test string (end of "file"). So now it will match any assortment of whitespace at the end of the file including tabs and spaces in addition to carriage returns and line breaks. I tested both of @Will's provided test cases.
So all together now, it should be:
(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z
EDIT #2: Alright there is one more possible case @Wil found that the last regex doesn't cover. That case is inputs that have line breaks at the beginning of the file before any content. So lets add one more pattern to match the beginning of the file.
\A\s+
- The \A
match the beginning of the file, the \s+
match one or more white space characters.
So now we've got:
\A\s+|(?<=\r?\n)(\s*$\r?\n)+|(?<=\r?\n)(\r?\n)+|\b\s+\z
So now we have four patterns for matching:
\r\n \r\n\t\r\n
)\r\n\r\n
)I'll go with:
public static string RemoveEmptyLines(string value) {
using (StringReader reader = new StringReader(yourstring)) {
StringBuilder builder = new StringBuilder();
string line;
while ((line = reader.ReadLine()) != null) {
if (line.Trim().Length > 0)
builder.AppendLine(line);
}
return builder.ToString();
}
}
Here's another option: use the StringReader
class. Advantages: one pass over the string, creates no intermediate arrays.
public static string RemoveEmptyLines(this string text) {
var builder = new StringBuilder();
using (var reader = new StringReader(text)) {
while (reader.Peek() != -1) {
string line = reader.ReadLine();
if (!string.IsNullOrWhiteSpace(line))
builder.AppendLine(line);
}
}
return builder.ToString();
}
Note: the IsNullOrWhiteSpace
method is new in .NET 4.0. If you don't have that, it's trivial to write on your own:
public static bool IsNullOrWhiteSpace(string text) {
return string.IsNullOrEmpty(text) || text.Trim().Length < 1;
}
off the top of my head...
string fixed = Regex.Replace(input, "\s*(\n)","$1");
turns this:
fdasdf asdf [tabs] [spaces] asdf
into this:
fdasdf asdf asdf