Following up on this question, it seems that a file- or disk-based Map
implementation may be the right solution to the problems I mentioned there.
UPDATE (some 4 years after first post...): beware that in newer versions of ehcache, persistence of cache items is available only in the pay product. Thanks @boday for pointing this out.
ehcache is great. It will give you the flexibility you need to implement the map in memory, disk or memory with spillover to disk. If you use this very simple wrapper for java.util.Map then using it is blindingly simple:
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Set;
import net.sf.ehcache.Cache;
import net.sf.ehcache.Element;
import org.apache.log4j.Logger;
import com.google.common.collect.Sets;
public class EhCacheMapAdapter<K,V> implements Map<K,V> {
@SuppressWarnings("unused")
private final static Logger logger = Logger
.getLogger(EhCacheMapAdapter.class);
public Cache ehCache;
public EhCacheMapAdapter(Cache ehCache) {
super();
this.ehCache = ehCache;
} // end constructor
@Override
public void clear() {
ehCache.removeAll();
} // end method
@Override
public boolean containsKey(Object key) {
return ehCache.isKeyInCache(key);
} // end method
@Override
public boolean containsValue(Object value) {
return ehCache.isValueInCache(value);
} // end method
@Override
public Set<Entry<K, V>> entrySet() {
throw new UnsupportedOperationException();
} // end method
@SuppressWarnings("unchecked")
@Override
public V get(Object key) {
if( key == null ) return null;
Element element = ehCache.get(key);
if( element == null ) return null;
return (V)element.getObjectValue();
} // end method
@Override
public boolean isEmpty() {
return ehCache.getSize() == 0;
} // end method
@SuppressWarnings("unchecked")
@Override
public Set<K> keySet() {
List<K> l = ehCache.getKeys();
return Sets.newHashSet(l);
} // end method
@SuppressWarnings("unchecked")
@Override
public V put(K key, V value) {
Object o = this.get(key);
if( o != null ) return (V)o;
Element e = new Element(key,value);
ehCache.put(e);
return null;
} // end method
@Override
public V remove(Object key) {
V retObj = null;
if( this.containsKey(key) ) {
retObj = this.get(key);
} // end if
ehCache.remove(key);
return retObj;
} // end method
@Override
public int size() {
return ehCache.getSize();
} // end method
@Override
public Collection<V> values() {
throw new UnsupportedOperationException();
} // end method
@Override
public void putAll(Map<? extends K, ? extends V> m) {
for( K key : m.keySet() ) {
this.put(key, m.get(key));
} // end for
} // end method
} // end class
I came accross jdbm2 a few weeks ago. The usage is very simple. You should be able to get it to work in half an hour. One drawback is that the object which is put into the map must be serializable, i.e. implement Serializable
. Other Cons are given in their website.
However, all object persistence database are not a permanent solution for storing objects of you own java class. If you decide to make change to the fields of the class, you will no longer be able to reteive the object from the map collection. It is ideal to store standard serializable classes line String
, Integer
, etc.
Have you never heard about prevalence frameworks ?
EDIT some clarifications on the term.
Like James Gosling now says, no SQL DB is as efficient as an in-memory storage. Prevalence frameworks (most known being prevayler and space4j) are built upon this idea of an in-memory, maybe storable on disk, storage. How do they work ? In fact, it's deceptively simple : a storage object contains all persistent entities. This storage can only be changed by serializable operations. As a consequence, putting an object in storage is a Put operation performed in isolated context. As this operation is serializable, it may (depending upon configuration) be also saved on disk for long-term persistence. However, the main data repository is the memory, which proides undoubtly fast access time, at the cost of a high memory usage.
Another advantage is that, because of their obvious simplicity, these frameworks hardly contain more than a tenth of classes
Considering your question, the use of Space4J immediatly came to my mind (as it provides support for "passivation" of rarely used objects, that's to say their index key is in memory, but the objects are kept on disk as long as they're not used).
Notice you can also find some infos at c2wiki.
We have a similar solution implemented using Xapian. It's fast, it's scalable, it provedes almost all search functionality you requested, it's free, multiplatform, and of course purgeable.
Berkeley DB Java Edition has a Collections API. Within that API, StoredMap in particular, is a drop-in replacement for a ConcurrentHashMap. You'll need to create the Environment and Database before creating the StoredMap, but the Collections tutorial should make that pretty easy.
Per your requirements, Berkeley DB is designed to be easy to use and I think that you'll find that it has exceptional scalability and performance. Berkeley DB is available under an open source license, it's persistent, platform independent and allows you to search for data. The data can certainly be purged/deleted, as needed. Berkeley DB has long list of other features which you may find highly useful to your application, especially as your requirements change and grow with the success of the application.
If you decide to use Berkeley DB Java Edition, please be sure to ask questions on the BDB JE Forum. There's an active developer community that's happy to help answer questions and resolve problems.
The google-collections library, part of http://code.google.com/p/guava-libraries/, has some really useful Map tools. MapMaker in particular lets you make concurrent HashMaps with timed evictions, soft values that will be swept up by the garbage collector if you're running out of heap, and computing functions.
Map<String, String> cache = new MapMaker()
.softValues()
.expiration(30, TimeUnit.MINUTES)
.makeComputingMap(new Function<String, String>() {
@Override
public String apply(String input) {
// Work out what the value should be
return null;
}
});
That will give you a Map cache that will clean up after itself and can work out its values. If you're able to compute values like that then great, otherwise it would map perfectly onto http://redis.io/ which you'd be writing into (to be fair, redis would probably be fast enough on its own!).