Yet another question about which NoSQL to choose. However, I haven\'t found yet someone asking for this type of purpose, message storing...
I have an Erlang Chat Ser
I can't speak to Riak at all, but I'd question your choice to discard Mongo. It's quite good as long as you leave journaling turned off and don't completely starve it for RAM.
I know quite a lot about HBase, and it sounds like it would meet your needs easily. Might be overkill depending on how many users you have. It trivially supports things like storing many messages per user, and has functionality for automatic expiration of writes. Depending on how you architect your schema it may or may not be atomic, but that shouldn't matter for your use case.
The downsides are that there is a lot of overhead to set it up correctly. You need to know Hadoop, get HDFS running, make sure your namenode is reliable, etc. before standing up HBase.
I can't speak for Cassandra or Hbase, but let me address the Riak part.
Yes, Riak would be appropriate for your scenario (and I've seen several companies and social networks use it for a similar purpose).
To implement this, you would need the plain Riak Key/Value operations, plus some sort of indexing engine. Your options are (in rough order of preference):
CRDT Sets. If your 1-N collection size is reasonably sized (let's say, there's less than 50 messages per user or whatever), you can store the keys of the child collection in a CRDT Set Data Type.
Riak Search. If your collection size is large, and especially if you need to search your objects on arbitrary fields, you can use Riak Search. It spins up Apache Solr in the background, and indexes your objects according to a schema you define. It has pretty awesome searching, aggregation and statistics, geospatial capabilities, etc.
Secondary Indexes. You can run Riak on top of an eLevelDB storage back end, and enable Secondary Index (2i) functionality.
Run a few performance tests, to pick the fastest approach.
As far as schema, I would recommend using two buckets (for the setup you describe): a User bucket, and a Message bucket.
Index the message bucket. (Either by associating a Search index with it, or by storing a user_key via 2i). This lets you do all of the required operations (and the message log does not have to fit into memory):
multiFetch
capability, client-side). I'd recommend using distributed key/value store like Riak or Couchbase and keep the whole message log for each user serialized (into binary erlang terms or JSON/BSON) as one value.
So with your usecases it will look like:
The obvious limitation - message log has to fit in memory.
If you decide to store each message individually it will require from distributed database to sort them after retrieval if you want them to be in time-order, so it will hardly help to handle larger-than-memory datasets. If it is required - you will anyway end up with some more tricky scheme.