We have microservices which work with different, but related data. For example, ads and their stats. We want to be able to filter, sort and aggregate this related data for UI(an
You could use CQRS. In this low level architecture, the model use for writing data is split from the model use to read/query data. The write model is the canonical source of information, is the source of truth.
The write model publishes events that are interpreted/projected by one or more read models, in an eventually consistent manner. Those events could be even published in a message queue and consumed by external read models (other microservices). There is no 1:1 mapping from write to read. You can have 1 model for write and 3 models for read. Each read model is optimized for its use-case. This is the part that interests you: an speed-optimized read model.
An optimized read model has every thing it needs when it answers the queries. The data is fully denormalized (this means it needs no joins) and already indexed.
A read model can have its data sharded. You do this in order to minimize the collection size (a small collection is faster than a bigger one). In your case, you could shard by user: each user would have its own collection of statistics (i.e. a table in SQL or a document collection in NoSQL). You can use the build-in sharding of the database or you could shard it manually, by splitting in separate collections (tables).
Services doesn't have all the data.
A read model could subscribe to many sources of truth (i.e. microservices or event streams).
One particular case that works very well with CQRS is Event sourcing; it has the advantage that you have the events from the begging of time, without the need to store them in a persistent message queue.
P.S. I could not think about a use-case when a read model could not be made fast enough, given enough hardware resources.