How to structure data in Riak?

匿名 (未验证) 提交于 2019-12-03 01:29:01

问题:

I'm trying to figure out how to model data in Riak. Let's say you are building something like a CMS with two features, news and products. You need to be able to store this information for multiple clients X and Y. How would you typically structure this?

  1. One bucket per client and then two keys news and products. Store multiple objects under each key and then use map/reduce to order them.

  2. Store both the news and the products in the same bucket, but with a new autogenerated key for each news item and product item. That is, one bucket for X and one for Y.

  3. One bucket per client/feature combination, that is, the buckets would be X-news, X-products, Y-news and Y-products. Then use map/reduce on the whole bucket to return the results in order.

Which would be the best way to handle this problem?

回答1:

I'd create 2 buckets: news and products. Then I'd prefix keys in each bucket with client names. I'd probably also include dates in news keys for easy date ranging.

news/acme_2011-02-23_01 news/acme_2011-02-23_02 news/bigcorp_2011-02-21_01 

And optionally prefix product names with category names

products/acme_blacksmithing_anvil products/bigcorp_databases_oracle 

Then in your map/reduce you could use key filtering:

// BigCorp News items {   "inputs":{      "bucket":"news",      "key_filters":[["starts_with", "bigcorp"]]   }   // ... rest of mapreduce job }  // Acme Blacksmithing items {   "inputs":{      "bucket":"products",      "key_filters":[["starts_with", "acme_blacksmithing"]]   }   // ... rest of mapreduce job }  // News for all clients from Feb 12th to 19th {   "inputs":{      "bucket":"news",      "key_filters":[["tokenize", "_", 2],                     ["between", "2011-02-12", "2011-02-19"]]   }   // ... rest of mapreduce job } 


回答2:

An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.

Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.

You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.

1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).

2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.

3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key). Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!