In C# what\'s the best way to remove blank lines i.e., lines that contain only whitespace from a string? I\'m happy to use a Regex if that\'s the best solution.
EDIT
string outputString;
using (StringReader reader = new StringReader(originalString)
using (StringWriter writer = new StringWriter())
{
string line;
while((line = reader.ReadLine()) != null)
{
if (line.Trim().Length > 0)
writer.WriteLine(line);
}
outputString = writer.ToString();
}
In response to Will's bounty, which expects a solution that takes "test\r\n \r\nthis\r\n\r\n"
and outputs "test\r\nthis"
, I've come up with a solution that makes use of atomic grouping (aka Nonbacktracking Subexpressions on MSDN). I recommend reading those articles for a better understanding of what's happening. Ultimately the atomic group helped match the trailing newline characters that were otherwise left behind.
Use RegexOptions.Multiline
with this pattern:
^\s+(?!\B)|\s*(?>[\r\n]+)$
Here is an example with some test cases, including some I gathered from Will's comments on other posts, as well as my own.
string[] inputs =
{
"one\r\n \r\ntwo\r\n\t\r\n \r\n",
"test\r\n \r\nthis\r\n\r\n",
"\r\n\r\ntest!",
"\r\ntest\r\n ! test",
"\r\ntest \r\n ! "
};
string[] outputs =
{
"one\r\ntwo",
"test\r\nthis",
"test!",
"test\r\n ! test",
"test \r\n ! "
};
string pattern = @"^\s+(?!\B)|\s*(?>[\r\n]+)$";
for (int i = 0; i < inputs.Length; i++)
{
string result = Regex.Replace(inputs[i], pattern, "",
RegexOptions.Multiline);
Console.WriteLine(result == outputs[i]);
}
EDIT: To address the issue of the pattern failing to clean up text with a mix of whitespace and newlines, I added \s*
to the last alternation portion of the regex. My previous pattern was redundant and I realized \s*
would handle both cases.
string corrected =
System.Text.RegularExpressions.Regex.Replace(input, @"\n+", "\n");
Eh. Well, after all that, I couldn't find one that would hit all the corner cases I could figure out. The following is my latest incantation of a regex that strips
(?<=(\r\n)|^)\s*\r\n|\r\n\s*$
which essentially says:
The first half catches all whitespace at the start of the string until the first non-whitespace line, or all whitespace between non-whitespace lines. The second half snags the remaining whitespace in the string, including the last non-whitespace line's newline.
Thanks to all who tried to help out; your answers helped me think through everything I needed to consider when matching.
*(This regex considers a newline to be \r\n
, and so will have to be adjusted depending on the source of the string. No options need to be set in order to run the match.)
s = Regex.Replace(s, @"^[^\n\S]*\n", "");
[^\n\S]
matches any character that's not a linefeed or a non-whitespace character--so, any whitespace character except \n
. But most likely the only characters you have to worry about are space, tab and carriage return, so this should work too:
s = Regex.Replace(s, @"^[ \t\r]*\n", "");
And if you want it to catch the last line, without a final linefeed:
s = Regex.Replace(s, @"^[ \t\r]*\n?", "");
If you want to remove lines containing any whitespace (tabs, spaces), try:
string fix = Regex.Replace(original, @"^\s*$\n", string.Empty, RegexOptions.Multiline);
Edit (for @Will): The simplest solution to trim trailing newlines would be to use TrimEnd on the resulting string, e.g.:
string fix =
Regex.Replace(original, @"^\s*$\n", string.Empty, RegexOptions.Multiline)
.TrimEnd();