Choosing a NoSQL database for storing events in a CQRS designed application

问题

I am looking for a good, up to date and "decision helping" explanation on how to choose a NoSQL database engine for storing all the events in a CQRS designed application.

I am currently a newcomer to all things around NoSQL (but learning): please be clear and do not hesitate to explain your point of view in an (almost too much) precise manner. This post may deserve other newcomers like me.

This database will:

Be able to insert 2 to 10 rows per updates asked by the front view (in my case, updates are frequent). Think of thousand of updates per minute, how would it scale?
Critically need to be consistent and failure safe, since events are the source of truth of the application
Not need any link between entities (like RDBMS does) except maybe a user ID/GUID (I don't know if it's critical or needed yet)
Receive events containing 3 to 10 "columns" (a sequence ID, an event name, a datetime, a JSON/binary encoded parameter bag, some context informations..). Without orientating your point of view in a column-oriented type of database, it may be document-oriented if it fits all other requirements
Be used as a queue or sent to/read from an external AMQP system like RabbitMQ or ZeroMQ (didn't worked that part yet, if you could also argument/explain..) since view projections will be built upon events
Need some kind of filtering by sequence ID like SELECT * FROM events WHERE sequence_id > last_sequence_id for subscribers (or queue systems) to be able to synchronize from a given point

I heard of HBase for CQRS event storing, but maybe MongoDB could fit? Or even Elasticsearch (would not bet on that one..)? I'm also open to RDBMS for consistency and availability.. but what about the partition tolerance part..?

Really I'm lost, I need arguments to make a pertinent choice.

回答1:

https://geteventstore.com/ is a database designed specifically for event streams.

They take consistency and reliability of the source of truth (your events) very seriously and I use it myself to read/write thousands of events a second.

回答2:

I have a working, in production implementation of MongoDB as an Event store. It is used by a CQRS + Event sourcing web based CRM application.

In order to provide 100% transaction-less but transaction-like guarantee for persisting multiple events in one go (all events or none of them) I use a MongoDB document as an events commit, with events as nested documents. As you know, MongoDB has document level locking.

For concurrency I use optimistic locking, using a version property for each Aggregate steam. An Aggregate stream is identified by the dublet (Aggregate class x Aggregate ID).

The event store also stores the commits in relative order using a sequence on each commit, incremented on each commit, protected using optimistic locking.

Each commit contains the following:

aggregateId : string, probably a GUID,
aggregateClass: string,
version: integer, incremented for each aggregateId x aggregateClass,
sequence, integer, incremented for each commit,
createdAt: UTCDateTime,
authenticatedUserId: string or null,
events: list of EventWithMetadata,

Each EventWithMetadata contains the event class/type and the payload as string (the serialized version of the actual event).

The MongoDB collection has the following indexes:

aggregateId, aggregateClass, version as unique
events.eventClass, sequence
sequence
other indexes for query optimization

These indexes are used to enforce the general event store rules (no events are stored for the same version of an Aggregate) and for query optimizations (the client can select only certain events - by type - from all streams).

You could use sharding by aggregateId to scale, if you strip the global ordering of events (the sequence property) and you move that responsibility to an event publisher but this complicates things as the event publisher needs to stay synchronized (even in case of failure!) with the event store. I recommend to do it only if you need it.

Benchmarks for this implementation (on Intel I7 with 8GB of RAM):

total aggregate write time was: 7.99, speed: 12516 events wrote per second
total aggregate read time was: 1.43, speed: 35036 events read per second
total read-model read time was: 3.26, speed: 30679 events read per second

I've noticed that MongoDB was slow on counting the number of events in the event store. I don't know why but I don't care as I don't need this feature.

I recommend using MongoDB as an event store.

回答3:

I have an .NET Core event sourcing implementation project https://github.com/jacqueskang/EventSourcing

I started with relational database (SQL Server and MySQL) using entity framework core. Then moved to AWS so I wrote a DynamoDB extension.

My experience is that relational DB can do the job perfectly but it depends on requirement and your technical stack. If your project is cloud based then the best option might probably be cloud provider's no-sql database like AWS DynamoDB or Azure CosmosDB, which are powerful in proformance and provide additional features (e.g. DynamoDB can trigger a notification or lambda function)

来源：https://stackoverflow.com/questions/43408599/choosing-a-nosql-database-for-storing-events-in-a-cqrs-designed-application

标签

domain-driven-design

cqrs

consistency

nosql