问题
I need to build a analytics server for large scale (seven figures and up) quickly and for the cheap.
Piwik would be the easy choice but for what I've gathered so far, Piwik is rather hard to scale and can require rather hefty servers to handle loads.
My second idea would to create quick and dirty Node.js server which just pushes everything to Amazon DynamoDB, so that one can start gathering the data from the day one and then build the UI later on. That would be quick to create and scale (vertically and horizontally). However, I'm wondering if DynamoDB is the right choice for such use? (gather data, generate reports)
回答1:
I'm using DynamoDB professionaly and would not use it for your application.
DynamoDB truly has tons of constraints. Among them, you can have only one hash_key
and optionally, one range_key
.
You may do some "analytics" for items grouped under a given hash_key
using query
but really nothing fancy. For complex queries, you would have to use scan
or EMR which are slow and expensive and have a couple of drawbacks due to throttling.
Nonetheless, NoSQL seems a good choice, at least for the prototyping stage of your application. But, I would recommend MongoDB instead. You can index any column, do complex queries, do not worry about data throttling. Sharding and replications is not too hard to setup.
MongoDB has a strong ecosystem and community which DynamoDB has not (yet) as it is much younger. MongoDB also has hosted offers which would allow you to bootstrap your application as quickly as you would with DynamoDB.
回答2:
Piwik scales up to millions of pages & dozens of thousands of tracked websites per month. See their docs: http://piwik.org/docs/optimize/ and: http://piwik.org/blog/2012/07/piwik-high-scale-performance-report-as-of-july-2012/
来源:https://stackoverflow.com/questions/12401460/how-to-quickly-build-large-scale-analytics-server