Replicating String.split with StringTokenizer

后端 未结 9 1061
心在旅途
心在旅途 2021-02-06 14:03

Encouraged by this, and the fact I have billions of string to parse, I tried to modify my code to accept StringTokenizer instead of String[]

The only t

9条回答
  •  再見小時候
    2021-02-06 15:01

    Are you only actually tokenizing on commas? If so, I'd write my own tokenizer - it may well end up being even more efficient than the more general purpose StringTokenizer which can look for multiple tokens, and you can make it behave however you'd like. For such a simple use case, it can be a simple implementation.

    If it would be useful, you could even implement Iterable and get enhanced-for-loop support with strong typing instead of the Enumeration support provided by StringTokenizer. Let me know if you want any help coding such a beast up - it really shouldn't be too hard.

    Additionally, I'd try running performance tests on your actual data before leaping too far from an existing solution. Do you have any idea how much of your execution time is actually spent in String.split? I know you have a lot of strings to parse, but if you're doing anything significant with them afterwards, I'd expect that to be much more significant than the splitting.

提交回复
热议问题