I have a solution that can be parallelized, but I don\'t (yet) have experience with hadoop/nosql, and I\'m not sure which solution is best for my needs. In theory, if I had unl
It is serious problem without immidiate good solution in the open source space. In commercial space MPP databases like greenplum/netezza should do. Ideally you would need google's Dremel (engine behind BigQuery). We are developing open source clone, but it will take some time... Regardless of the engine used I think solution should include holding the whole dataset in memory - it should give an idea what size of cluster you need.