I am working on a system, written in C++, running on a Xeon on Linux, that needs to run as fast as possible. There is a large data structure (basically an array of structs)
Good (long) article about organizing data structures to take cache and RAM hierarchy into account from GNU's libc maintainer: https://lwn.net/Articles/250967/ (full PDF here: http://www.akkadia.org/drepper/cpumemory.pdf)
You might want to head over to http://agner.org/optimize/ and grab the optimization PDFs available there - there's a lot of good (low-level) information in there. Pretty focused on assembly language level, but there's lessons to be learned for C/C++ programmers as well.
Volume 3, "The microarchitecture of Intel, AMD and VIA CPUs" should be of interest :-)
Today’s CPUs fetch memory in chunks of (typically) 64 bytes, called cache lines. When you read a particular memory location, the entire cache line is fetched from the main memory into the cache.
More here : http://igoro.com/archive/gallery-of-processor-cache-effects/
Old SO question that has some info that might be of use to you (in particular the first answer where to look for Linux CPU info - responder doesn't mention line size proper, but 'other info' on top of associativity etc). Question is for x86, but answers are more general. Worth a look.
Where is the L1 memory cache of Intel x86 processors documented?
A cache line for any current Xeon processor is 64 bytes. One other thing that you might want to think about is the TLB. If you are really doing random accesses across 10GB of memory then you are likely to have a lot of TLB misses which can potentially be as costly as cache misses. You can get work around with with large pages, but it's something to keep in mind.