Currently i have a setup where my clients (web apps, iOS app etc) talks to my backend API .NET web app (Nancy) via REST calls. Nothing special.
I now have a requirement
First - clarify your distinctions between "real-time", "synchronous/asynchronous" and "one-way/two-way". The things you rule out (queues, and pub/sub) can certainly be used for two-way request/response, but they are asynchronous.
Second - clarify "efficiency" - efficiency on what metric? Bandwidth? Latency? Development time? Client support?
Third - realize that (one of) the costs of microservices is latency. If that's an issue for you at your first integration, you're likely in for a long road.
What's the different ways i could communicate between my main API and other microservice API's? Pros/cons of each approach?
Off the top of my head:
You'll note that this is the same list when we tie multiple applications together...because that's what you're doing. Just cause you made the applications smaller doesn't really change much except makes your system even more distributed. Expect to solve all the same problems "normal" distributed systems have, and then a few extra ones related to deployment and versioning.
Consider an idempotent GET request from a user like "Get me question 1". That client expects a JSON response of question 1. Simple. In my expected architecture, the client would hit api.myapp.com, which would then proxy a call via REST to question-api.myapp.com (microservice) to get the data, then return to user. How could we use pub/sub here? Who is the publisher, who is the subscriber? There's no event here to raise. My understanding of queues: one publisher, one consumer. Pub/sub topic: one publisher, many consumers. Who is who here?
Ok - so first, if we're talking about microservices and latency - we're going to need a more representative example. Let's say our client is the Netflix mobile app, and to display the opening screen it needs the following information:
Each one of those is provided by a different microservice (we'll call them M1-M5). Each call from client -> datacenter has 100ms expected latency; calls between services have 20ms latency.
Let's compare some approaches:
As expected, that's the lowest latency option - but requires everything in a monolithic service, which we've decided we don't want because of operational concerns.
That's 500ms. Using a proxy w/this isn't going to help - it'll just add 20ms latency to each request (making it 600ms). We have a dependency between 1 + 2 and 4, and 3 and 5, but can do some async. Let's see how that helps.
We're down to 200ms; not bad - but our client needs to know about our microservice architecture. If we abstract that with our proxy, then we have:
Down to 140ms, since we're leveraging the decreased intra-service latency.
Great - when things are working smoothly, we've only increased latency by 40% compared to monolithic (#1).
But, as with any distributed system, we also have to worry about when things aren't going smoothly.
What happens when M4's latency increases to 200ms? Well, in the client -> async microservice route (#3), then we have partial page results in 100ms (the first batch of requests), unavailable in 200ms and summaries in 400ms. In the proxy case (#4), we have nothing until 340ms. Similar considerations if a microservice is completely unavailable.
Queues are a way of abstracting producer/consumer in space and time. Let's see what happens if we introduce one:
Our client, who is subscribed to P2 - receives partial results w/a single request and is abstracted away from the workflow between M1 + M2 and M4, and M3 and M5. Our latency in best case is 140ms, same as #4 and in worst case is similar to the direct client route (#3) w/partial results.
We have a much more complicated internal routing system involved, but have gained flexibility w/microservices while minimizing the inevitable latency. Our client code is also more complex - since it has to deal with partial results - but is similar to the async microservice route. Our microservices are generally independent of each other - they can be scaled independently, and there is no central coordinating authority (like in the proxy case). We can add new services as needed by simply subscribing to the appropriate channels, and having the client know what to do with the response we generate (if we generate one for client consumption of course).
You could do a variation of using a gateway to aggregate responses, while still using queues internally. It would look a lot like #4 externally, but #5 internally. The addition of a queue (and yes, I've been using queue, pub/sub, topic, etc. interchangeably) still decouples the gateway from the individual microservices, but abstracts out the partial result problem from the client (along w/its benefits, though).
The addition of a gateway, though, does allow you to handle the partial result problem centrally - useful if it's complex, ever changing, and/or reimplemented across multiple platforms.
For instance, let's say that, in the event that M4 (the summary service) is unavailable - we have a M4b that operates on cached data (so, eg., the star rating is out of date). M4b can answer the R4a and R4b immediately, and our Gateway can then determine if it should wait for M4 to answer or just go w/M4b based on a timeout.
For further info on how Netflix actually solved this problem, take a look at the following resources: