问题
We have microservices which work with different, but related data. For example, ads and their stats. We want to be able to filter, sort and aggregate this related data for UI(and not only for it). For example, we want to show to a user ads which have 'car' in their text and which have more than 100 clicks.
Challenges:
- There could be a lot of data. Some users have millions of rows after filtration
- Services doesn't have all the data. For example, for statistics service ad without stats == non existent ad. It doesn't know anything about such ads. But sorting and filtration should work anyway(ad without stats should be considered as ad without zero clicks)
Requirements:
- Eventual consistency within couple of seconds is OK
- Data loss is not acceptable
- 5 to 10 seconds filtration and sorting for big clients with millions of rows is OK
Solutions we could think of:
- Load all data required by query from all services and filter and sort it in memory.
- Push updates from services to Elasticsearch(or something like this). Elastic handles query and returns ids of desired entities which then loaded from services.
- One big database for all services which has everything
What should we pay attention to? Are there other ways to solve our problem?
回答1:
You could use CQRS. In this low level architecture, the model use for writing data is split from the model use to read/query data. The write model is the canonical source of information, is the source of truth.
The write model publishes events that are interpreted/projected by one or more read models, in an eventually consistent manner. Those events could be even published in a message queue and consumed by external read models (other microservices). There is no 1:1 mapping from write to read. You can have 1 model for write and 3 models for read. Each read model is optimized for its use-case. This is the part that interests you: an speed-optimized read model.
An optimized read model has every thing it needs when it answers the queries. The data is fully denormalized (this means it needs no joins) and already indexed.
A read model can have its data sharded. You do this in order to minimize the collection size (a small collection is faster than a bigger one). In your case, you could shard by user: each user would have its own collection of statistics (i.e. a table in SQL or a document collection in NoSQL). You can use the build-in sharding of the database or you could shard it manually, by splitting in separate collections (tables).
Services doesn't have all the data.
A read model could subscribe to many sources of truth (i.e. microservices or event streams).
One particular case that works very well with CQRS is Event sourcing; it has the advantage that you have the events from the begging of time, without the need to store them in a persistent message queue.
P.S. I could not think about a use-case when a read model could not be made fast enough, given enough hardware resources.
来源:https://stackoverflow.com/questions/48458627/how-to-filter-and-sort-data-from-multiple-microservices