I am looking for a good, up to date and "decision helping" explanation on how to choose a NoSQL database engine for storing all the events in a CQRS designed application.
I am currently a newcomer to all things around NoSQL (but learning): please be clear and do not hesitate to explain your point of view in an (almost too much) precise manner. This post may deserve other newcomers like me.
This database will:
Be able to insert 2 to 10 rows per updates asked by the front view (in my case, updates are frequent). Think of thousand of updates per minute, how would it scale?
Critically need to be consistent and failure safe, since events are the source of truth of the application
Not need any link between entities (like RDBMS does) except maybe a user ID/GUID (I don't know if it's critical or needed yet)
Receive events containing 3 to 10 "columns" (a sequence ID, an event name, a datetime, a JSON/binary encoded parameter bag, some context informations..). Without orientating your point of view in a column-oriented type of database, it may be document-oriented if it fits all other requirements
Be used as a queue or sent to/read from an external AMQP system like RabbitMQ or ZeroMQ (didn't worked that part yet, if you could also argument/explain..) since view projections will be built upon events
Need some kind of filtering by sequence ID like
SELECT * FROM events WHERE sequence_id > last_sequence_id
for subscribers (or queue systems) to be able to synchronize from a given point
I heard of HBase for CQRS event storing, but maybe MongoDB could fit? Or even Elasticsearch (would not bet on that one..)? I'm also open to RDBMS for consistency and availability.. but what about the partition tolerance part..?
Really I'm lost, I need arguments to make a pertinent choice.
https://geteventstore.com/ is a database designed specifically for event streams.
They take consistency and reliability of the source of truth (your events) very seriously and I use it myself to read/write thousands of events a second.
I have a working, in production implementation of MongoDB
as an Event store
. It is used by a CQRS
+ Event sourcing
web based CRM
application.
In order to provide 100% transaction-less but transaction-like guarantee for persisting multiple events in one go (all events or none of them) I use a MongoDB document
as an events commit
, with events as nested documents
. As you know, MongoDB
has document level locking.
For concurrency I use optimistic locking, using a version
property for each Aggregate steam
. An Aggregate stream
is identified by the dublet (Aggregate class
x Aggregate ID
).
The event store also stores the commits in relative order using a sequence
on each commit
, incremented on each commit, protected using optimistic locking.
Each commit
contains the following:
- aggregateId : string, probably a
GUID
, - aggregateClass: string,
- version: integer, incremented for each aggregateId x aggregateClass,
- sequence, integer, incremented for each commit,
- createdAt: UTCDateTime,
- authenticatedUserId: string or null,
- events: list of
EventWithMetadata
,
Each EventWithMetadata
contains the event class/type
and the payload as string (the serialized version of the actual event).
The MongoDB
collection has the following indexes:
aggregateId
,aggregateClass
,version
asunique
events.eventClass
,sequence
sequence
- other indexes for query optimization
These indexes are used to enforce the general event store rules (no events are stored for the same version of an Aggregate
) and for query optimizations (the client can select only certain events - by type - from all streams).
You could use sharding by aggregateId
to scale, if you strip the global ordering of events (the sequence
property) and you move that responsibility to an event publisher
but this complicates things as the event publisher
needs to stay synchronized (even in case of failure!) with the event store
. I recommend to do it only if you need it.
Benchmarks for this implementation (on Intel I7
with 8GB
of RAM
):
- total aggregate write time was: 7.99, speed: 12516 events wrote per second
- total aggregate read time was: 1.43, speed: 35036 events read per second
- total read-model read time was: 3.26, speed: 30679 events read per second
I've noticed that MongoDB
was slow on counting
the number of events in the event store. I don't know why but I don't care as I don't need this feature.
I recommend using MongoDB
as an event store
.
I have an .NET Core event sourcing implementation project https://github.com/jacqueskang/EventSourcing
I started with relational database (SQL Server and MySQL) using entity framework core. Then moved to AWS so I wrote a DynamoDB extension.
My experience is that relational DB can do the job perfectly but it depends on requirement and your technical stack. If your project is cloud based then the best option might probably be cloud provider's no-sql database like AWS DynamoDB or Azure CosmosDB, which are powerful in proformance and provide additional features (e.g. DynamoDB can trigger a notification or lambda function)
来源:https://stackoverflow.com/questions/43408599/choosing-a-nosql-database-for-storing-events-in-a-cqrs-designed-application