Currently I\'m using socket.io without RethinkDB like this:
Clients emit events to socket.io, which receives the events, emits to various other clients, and saves to the
First, let's clarify the relationship between socket.io and RethinkDB changefeeds. Socket.io is intended for realtime communication between client (the browser) and the server (Node.js). RethinkDB changfeeds are way for your server (Node.js) to listen to changes in the database. The client can't communicate with RethinkDB directly.
A very typical architecture for a realtime app is to have RethinkDB changefeeds subscribe to changes in the database and then use socket.io to pass those changes to the client. The client usually also emits messages which can get written to your database, depending on your application logic.
Yes, you could just emit all messages through socket.io then pass all messages to all clients, and then just write those messages to the database for persistence. It's also true that this is faster, but there are a number of disadvantages to this approach.
1. Database as single source of truth
The easiest problem to spot is the following:
These are just some quick examples in which, because of your architecture, you will lose or have out-of-sync data. And just to reiterate this, you WILL lose data, because your main source of truth is in-memory. You might also have discrepancies between the data in your Node.js app and your DB.
The point is that the database should always be your single source of truth and you should only acknowledge data when it's written to disk. I'm not sure how anyone would be able to sleep at night otherwise.
2. Advanced Queries
If you just pass all new messages from all clients to all clients through socket.io, you now have to have some pretty complex logic in your client in order to filter out all the data that's actually important. Take into consideration that you're passing a lot of useless data through the network that the client won't actually use.
The alternative is writing a pub/sub system in which you subscribe to certain channels (or something like that) in order to filter out the data that's actually important to the client.
RethinkDB solves this by providing it's very own query language that you can attach to changefeeds. If the client, for example, needs all the users in my users
table between the ages of 20 to 30, that live in the state of California, 10 miles from San Francisco, and who have bought a book within the last 6 monhts, this can be expressed in ReQL (RethinkDB's query language) and a changefeed can be setup for that query, so that the client only gets notified when relevant changes. This is much harder to do with just Socket.io and Node.js.
3. Scalability
The last problem that RethinkDB solves is that it's a much more scalable solution to just storing everything in memory (through Socket.io and Node.js). Because RethinkDB is built from the ground up to be distributed, you can have a cluster of 20+ RethinkDB nodes with shards and replicas. Every RethinkDB query you write is distributed by default. Now, you can have 20+ other Node.js nodes that are stateless and are all listening to changfeeds. Because the database is the central source of truth, this is not a problem.
The alternative would be to limit yourself to one server, have some other pub/sub system (built on something like Reddis, for example), have only a single database that you poll... There's probably more examples, but you can see where I'm going with this.
I'd love to hear if this answered your question and if I'm getting where you're coming from. It's a little hard to get how to structure your applications at first, but it really is an elegant solution for most realtime architectures.