I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be
nmdb sounds like its exactly what you need. Distributed, in memory cache, with a persistent on-disk storage. Current back-ends include qdbm, berkeley db, and (recently added after a quick email to the developer) tokyo cabinet. key/value size is limited though, but I believe that can be lifted if you don't need TICP support.