Master Data Management Strategies in Microservices paradigm

两盒软妹~` 提交于 2021-01-21 07:32:31

问题


Working on migrating a huge monolithic application to microservices paradigm, needless to say the domains identification and mapping then to different microservices and the orchestration has been quite a task. Now as the previous application shared the master data in the same schema, in the new paradigm it gets difficult for me to manage that, my choices are:

  1. Replicate the same master data in each microservice: Pros: when cached in the application works fast and no looksup, application within itself acts as a true source of truth. Cons: Any updates on master data in a particular service could lead to inconsistencies while the services are trying communicate among each other using this data, the updates to the master data can cause serious consistency problems across.
  2. Have the master data hosted as a seperate microservice: Pros: Single source of master data. Cons: Hit on performance since it always a service call over the wire when a lookup happens.
  3. Create a distributed cache and expose it to multiple microservices: would break the "Single Source o Truth" principle of data for microservices but could ensure performance and consistency being a write through implementation.

Any thoughts on above or any implementation strategies would really help...

Vaibhav


回答1:


Solution for this particular problem or dilemma depends on some information about your current Architecture.

  • How do your micro-services communicate with each other? Are you using Commands/Queries as direct calls and events over some queue?

  • How big is your master-data? Is it some sort of configuration or small amount of cashed data which is used as some sort of constants or settings?

If one of your communication mechanisms is done asynchronous with Events coming from some Queue and you are not dealing with huge amount of data which is very frequently changed then my recommendation would be to:

1. Create a dedicated master-data-micro-service. This micro-service would be the owner of your master-data. It would be the only one which would allow direct changes on the Entities inside it.

2. Publish events to a queue on changes on every Entity in master-data-micro-service. Whenever someone creates, updates or deletes entities in master-data-micro-service you would publish events to some queue about those changes.

3. Subscribe to master-data-micro-service events. All other micro-services who need the master-data-micro-service data would subscribe to the Events of the Entities it uses and saves them locally in its database. This data or subset of master-data would be saved as a copy for local usage. This master-data Entities can only be changed with these events when their "source of truth" the master-data-micro-service publishes events that they have been changed. Any other type of change would be forbidden as it would create a difference between local copy of that data and its source of truth in the master-data-micro-service.

Pros:

With this approach you would only have one source of truth for your master data. All other micro-services would only use the data or subset of data from the master-data-micro-service which they need. Other data they can simply ignore. Other advantage is that your micro-service would be able to operate on its own without calling the master-data-micro-service directly to get some data it needs.

Cons

The drawback is that you would need to duplicate data in multiple micro-services. The other problem is that you need to deal with the complexity of a distributed system, but you are already doing this ;)

Some comments on your provided choices:

Replicate the same master data in each microservice: Pros: when cached in the application works fast and no looksup, application within itself acts as a true source of truth. Cons: Any updates on master data in a particular service could lead to inconsistencies while the services are trying communicate among each other using this data, the updates to the master data can cause serious consistency problems across.

My suggestion from above already covers this approach partly, only without the direct calls. Assumption was that you would use a queue. Even if you don't use a queue you could notify the micro-services which use the master-data-micro-service with some notification system and then and only then let them call your master-data-micro-service the get the latest data. And not do a call on every operation which is inside micro-service which requires master-data. That would be very inefficient.

Have the master data hosted as a seperate microservice: Pros: Single source of master data. Cons: Hit on performance since it always a service call over the wire when a lookup happens.

My suggested approach from above is a joined approach with this and your first point about replicating data in each micro-service.

Create a distributed cache and expose it to multiple microservices: would break the "Single Source o Truth" principle of data for microservices but could ensure performance and consistency being a write through implementation.

I would not recommend doing this. There are many reasons why not. Some you already mentioned. One thing to consider when doing this that you will have 1 joined single point of failure for multiple micro-services. That is something which goes against one of the main principles of micro-services.



来源:https://stackoverflow.com/questions/57738802/master-data-management-strategies-in-microservices-paradigm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!