I\'m trying to figure out how to model data in Riak. Let\'s say you are building something like a CMS with two features, news and products. You need to be able to store this
An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.
Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.
You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.
1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).
2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.
3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key). Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.
I'd create 2 buckets: news and products. Then I'd prefix keys in each bucket with client names. I'd probably also include dates in news keys for easy date ranging.
news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01
And optionally prefix product names with category names
products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle
Then in your map/reduce you could use key filtering:
// BigCorp News items
{
"inputs":{
"bucket":"news",
"key_filters":[["starts_with", "bigcorp"]]
}
// ... rest of mapreduce job
}
// Acme Blacksmithing items
{
"inputs":{
"bucket":"products",
"key_filters":[["starts_with", "acme_blacksmithing"]]
}
// ... rest of mapreduce job
}
// News for all clients from Feb 12th to 19th
{
"inputs":{
"bucket":"news",
"key_filters":[["tokenize", "_", 2],
["between", "2011-02-12", "2011-02-19"]]
}
// ... rest of mapreduce job
}