I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be
Open Source Cache Solutions in Java
Oracle Coherence (used to be Tangosol)
JCache JSR
DNS has the capability to do this, I don't know how large each one of your records is (8GB of tons of small data?), but it may work.
Distributed hash tables include Tapestry, Chord, and Pastry. One of these should suit your needs.
You should probably specify if it needs to be persistent or not, in memory or not, etc. You could try: http://www.danga.com/memcached/
nmdb sounds like its exactly what you need. Distributed, in memory cache, with a persistent on-disk storage. Current back-ends include qdbm, berkeley db, and (recently added after a quick email to the developer) tokyo cabinet. key/value size is limited though, but I believe that can be lifted if you don't need TICP support.
You might want to check out Hazelcast. It is distributed/partitioned, super lite, easy and free.
java.util.Map map = Hazelcast.getMap ("mymap");
map.put ("key1", "value1");
Regards,
-talip