To make things simple:
string streamR = sr.ReadLine(); // sr.Readline results in:
// one \"two two\
There's just a tiny problem with Squazz' answer.. it works for his string, but not if you add more items. E.g.
string myString = "WordOne \"Word Two\" Three"
In that case, the removal of the last quotation mark would get us 4 results, not three.
That's easily fixed though.. just count the number of escape characters, and if it's uneven, strip the last (adapt as per your requirements..)
public static List<String> Split(this string myString, char separator, char escapeCharacter)
{
int nbEscapeCharactoers = myString.Count(c => c == escapeCharacter);
if (nbEscapeCharactoers % 2 != 0) // uneven number of escape characters
{
int lastIndex = myString.LastIndexOf("" + escapeCharacter, StringComparison.Ordinal);
myString = myString.Remove(lastIndex, 1); // remove the last escape character
}
var result = myString.Split(escapeCharacter)
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
return result;
}
I also turned it into an extension method and made separator and escape character configurable.
string input = "one \"two two\" three \"four four\" five six";
var parts = Regex.Matches(input, @"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
As custom parser might be more suitable for this.
This is something I wrote once when I had a specific (and very strange) parsing requirement that involved parenthesis and spaces, but it is generic enough that it should work with virtually any delimiter and text qualifier.
public static IEnumerable<String> ParseText(String line, Char delimiter, Char textQualifier)
{
if (line == null)
yield break;
else
{
Char prevChar = '\0';
Char nextChar = '\0';
Char currentChar = '\0';
Boolean inString = false;
StringBuilder token = new StringBuilder();
for (int i = 0; i < line.Length; i++)
{
currentChar = line[i];
if (i > 0)
prevChar = line[i - 1];
else
prevChar = '\0';
if (i + 1 < line.Length)
nextChar = line[i + 1];
else
nextChar = '\0';
if (currentChar == textQualifier && (prevChar == '\0' || prevChar == delimiter) && !inString)
{
inString = true;
continue;
}
if (currentChar == textQualifier && (nextChar == '\0' || nextChar == delimiter) && inString)
{
inString = false;
continue;
}
if (currentChar == delimiter && !inString)
{
yield return token.ToString();
token = token.Remove(0, token.Length);
continue;
}
token = token.Append(currentChar);
}
yield return token.ToString();
}
}
The usage would be:
var parsedText = ParseText(streamR, ' ', '"');
You can even do that without Regex: a LINQ expression with String.Split
can do the job.
You can split your string before by "
then split only the elements with even index in the resulting array by .
var result = myString.Split('"')
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
For the string:
This is a test for "Splitting a string" that has white spaces, unless they are "enclosed within quotes"
It gives the result:
This
is
a
test
for
Splitting a string
that
has
white
spaces,
unless
they
are
enclosed within quotes
string myString = "WordOne \"Word Two\"";
var result = myString.Split('"')
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();
How do you define a quoted portion of the string?
We will assume that the string before the first "
is non-quoted.
Then, the string placed between the first "
and before the second "
is quoted. The string between the second "
and the third "
is non-quoted. The string between the third and the fourth is quoted, ...
The general rule is: Each string between the (2*n-1)th (odd number) "
and (2*n)th (even number) "
is quoted. (1)
What is the relation with String.Split
?
String.Split with the default StringSplitOption (define as StringSplitOption.None) creates an list of 1 string and then add a new string in the list for each splitting character found.
So, before the first "
, the string is at index 0 in the splitted array, between the first and second "
, the string is at index 1 in the array, between the third and fourth, index 2, ...
The general rule is: The string between the nth and (n+1)th "
is at index n in the array. (2)
The given (1)
and (2)
, we can conclude that: Quoted portion are at odd index in the splitted array.
You can use the TextFieldParser class that is part of the Microsoft.VisualBasic.FileIO
namespace. (You'll need to add a reference to Microsoft.VisualBasic
to your project.):
string inputString = "This is \"a test\" of the parser.";
using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(inputString)))
{
using (Microsoft.VisualBasic.FileIO.TextFieldParser tfp = new TextFieldParser(ms))
{
tfp.Delimiters = new string[] { " " };
tfp.HasFieldsEnclosedInQuotes = true;
string[] output = tfp.ReadFields();
for (int i = 0; i < output.Length; i++)
{
Console.WriteLine("{0}:{1}", i, output[i]);
}
}
}
Which generates the output:
0:This
1:is
2:a test
3:of
4:the
5:parser.
OP wanted to
... remove all spaces EXCEPT for the spaces found between quotation marks
The solution from Cédric Bignon almost did this, but didn't take into account that there could be an uneven number of quotation marks. Starting out by checking for this, and then removing the excess ones, ensures that we only stop splitting if the element really is encapsulated by quotation marks.
string myString = "WordOne \"Word Two";
int placement = myString.LastIndexOf("\"", StringComparison.Ordinal);
if (placement >= 0)
myString = myString.Remove(placement, 1);
var result = myString.Split('"')
.Select((element, index) => index % 2 == 0 // If even index
? element.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries) // Split the item
: new string[] { element }) // Keep the entire item
.SelectMany(element => element).ToList();
Console.WriteLine(result[0]);
Console.WriteLine(result[1]);
Console.ReadKey();
Credit for the logic goes to Cédric Bignon, I only added a safeguard.