(This post is regarding High Frequency type programming)
I recently saw on a forum (I think they were discussing Java) that if you have to parse a lot of string data its
He's saying that if you break a chunk text up into separate string objects, those string objects have worse locality than the large array of text. Each string, and the array of characters it contains, is going to be somewhere else in memory; they can be spread all over the place. It is likely that the memory cache will have to thrash in and out to access the various strings as you process the data. In contrast, the one large array has the best possible locality, as all the data is on one area of memory, and cache-thrashing will be kept to a minimum.
There are limits to this, of course: if the text is very, very large, and you only need to parse out part of it, then those few small strings might fit better in the cache than the large chunk of text.
There are lots of other reasons to use byte[]
or char*
instead of Strings for HFT. Strings consists of 16-bit char
in Java and are immutable. byte[]
or ByteBuffer
are easily recycled, have good cache locatity, can be off the heap (direct) saving a copy, avoiding character encoders. This all assumes you are using ASCII data.
char*
or ByteBuffers can also be mapped to network adapters to save another copy. (With some fiddling for ByteBuffers)
In HFT you are rarely dealing with large amounts of data at once. Ideally you want to be processing data as soon as it comes down the Socket. i.e. one packet at a time. (about 1.5 KB)