Is HBase meaningful if it's not running in a distributed environment?

那年仲夏 提交于 2019-12-03 16:18:23

Use the right tool for the job.

There are a lot of anti-RDBMSs or BASE systems (Basically Available, Soft State, Eventually consistent), as opposed to ACID (Atomicity, Consistency, Isolation, Durability) to choose from here and here.

I've used traditional RDBMSs and though you can store CLOBs/BLOBs, they do not have built-in indexes customized specifically for searching these objects.

You want to do most of the work (calculating the weighted frequency for each tuple found) when inserting a document.

You might also want to do some work scoring the usefulness of each (documentId,searchWord) pair after each search.

That way you can give better and better searches each time.

You also want to store a score or weight for each search and weighted scores for similarity to other searches.

It's likely that some searches are more common than others and that the users are not phrasing their search query correctly though they mean to do a common search.

Inserting a document should also cause some change to the search weight indexes.

The more I think about it, the more complex the solution becomes. You have to start with a good design first. The more factors your design anticipates, the better the outcome.

MapReduce seems like a great way of generating the tuples. If you can get a scala job into a jar file (not sure since I've not used scala before and am a jvm n00b), it'd be a simply matter to send it along and write a bit of a wrapper to run it on the map reduce cluster.

As for storing the tuples after you're done, you also might want to consider a document based database like mongodb if you're just storing tuples.

In general, it sounds like you're doing something more statistical with the texts... Have you considered simply using lucene or solr to do what you're doing instead of writing your own?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!