问题
I am using a tab (/t) as delimiter and I know there are some empty fields in my data e.g.:
one->two->->three
Where -> equals the tab. As you can see an empty field is still correctly surrounded by tabs. Data is collected using a loop :
while ((strLine = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(strLine, "\t");
String test = st.nextToken();
...
}
Yet Java ignores this "empty string" and skips the field.
Is there a way to circumvent this behaviour and force java to read in empty fields anyway?
回答1:
There is a RFE in the Sun's bug database about this StringTokenizer
issue with a status Will not fix
.
The evaluation of this RFE states, I quote:
With the addition of the
java.util.regex
package in1.4.0
, we have basically obsoleted the need forStringTokenizer
. We won't remove the class for compatibility reasons. Butregex
gives you simply what you need.
And then suggests using String#split(String) method.
回答2:
Thank you at all. Due to the first comment I was able to find a solution: Yes you are right, thank you for your reference:
Scanner s = new Scanner(new File("data.txt"));
while (s.hasNextLine()) {
String line = s.nextLine();
String[] items= line.split("\t", -1);
System.out.println(items[5]);
//System.out.println(Arrays.toString(cols));
}
回答3:
You can use Apache
Commons StringUtils.splitPreserveAllTokens(). It does exactly what you need.
回答4:
I would use Guava's Splitter, which doesn't need all the big regex machinery, and is more well-behaved than String's split()
method:
Iterable<String> parts = Splitter.on('\t').split(string);
回答5:
As you can see in the Java Doc http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html you can use the Constructor public StringTokenizer(String str, String delim, boolean returnDelims)
with returnDelims
true
So it returns each Delimiter as a seperate string!
Edit:
DON'T use this way, as @npe already typed out, StringTokenizer shouldn't be used any more! See JavaDoc:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the
split
method ofString
or thejava.util.regex
package instead.
来源:https://stackoverflow.com/questions/11409320/java-stringtokenizer-nexttoken-skips-over-empty-fields