I need to split a string at all whitespace, it should ONLY contain the words themselves.
How can I do this in vb.net?
Tabs, Newlines, etc. must all be split
If you want to avoid regex, you can do it like this:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit"
.Split()
.Where(x => x != string.Empty)
Visual Basic equivalent:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit" _
.Split() _
.Where(Function(X$) X <> String.Empty)
The Where()
is important since, if your string has multiple white space characters next to each other, it removes the empty strings that will result from the Split()
.
At the time of writing, the currently accepted answer (https://stackoverflow.com/a/1563000/49241) does not take this into account.
I found I used the solution as noted by Adam Ralph, plus the VB.NET comment below by P57, but with one odd exception. I found I had to add .ToList.ToArray on the end.
Like so:
.Split().Where(Function(x) x <> String.Empty).ToList.ToArray
Without that, I kept getting "Unable to cast object of type 'WhereArrayIterator`1[System.String]' to type 'System.String[]'."
Dim words As String = "This is a list of words, with: a bit of punctuation" + _
vbTab + "and a tab character." + vbNewLine
Dim split As String() = words.Split(New [Char]() {" "c, CChar(vbTab), CChar(vbNewLine) })
String.Split()
will split on every single whitespace, so the result will contain empty strings usually. The Regex solution Ruben Farias has given is the correct way to do it. I have upvoted his answer but I want to give a small addition, dissecting the regex:
\s
is a character class that matches all whitespace characters.
In order to split the string correctly when it contains multiple whitespace characters between words, we need to add a quantifier (or repetition operator) to the specification to match all whitespace between words. The correct quantifier to use in this case is +
, meaning "one or more" occurrences of a given specification. While the syntax "\s+"
is sufficient here, I prefer the more explicit "[\s]+
".
String.Split() (no parameters) does split on all whitespace (including LF/CR)
Try this:
Regex.Split("your string here", "\s+")