I am eager to know about the architecture of different real-time messenger apps. Are they using any generic protocol/architecture?
The WhatsApp Architecture Facebook Bought For $19 Billion explains the architecture involved in design of whatsapp.
Here is the general explanation from the link
WhatsApp server is almost completely implemented in Erlang.
Server systems that do the backend message routing are done in Erlang.
Great achievement is that the number of active users is managed with a really small server footprint. Team consensus is that it is largely because of Erlang.
Interesting to note Facebook Chat was written in Erlang in 2009, but they went away from it because it was hard to find qualified programmers.
WhatsApp server has started from ejabberd
Ejabberd is a famous open source Jabber server written in Erlang.
Originally chosen because its open, had great reviews by developers, ease of start and the promise of Erlang’s long term suitability for large communication system.
The next few years were spent re-writing and modifying quite a few parts of ejabberd, including switching from XMPP to internally developed protocol, restructuring the code base and redesigning some core components, and making lots of important modifications to Erlang VM to optimize server performance.
To handle 50 billion messages a day the focus is on making a reliable system that works. Monetization is something to look at later, it’s far far down the road.
A primary gauge of system health is message queue length. The message queue length of all the processes on a node is constantly monitored and an alert is sent out if they accumulate backlog beyond a preset threshold. If one or more processes falls behind that is alerted on, which gives a pointer to the next bottleneck to attack.
Multimedia messages are sent by uploading the image, audio or video to be sent to an HTTP server and then sending a link to the content along with its Base64 encoded thumbnail (if applicable).
Some code is usually pushed every day. Often, it’s multiple times a day, though in general peak traffic times are avoided. Erlang helps being aggressive in getting fixes and features into production. Hot-loading means updates can be pushed without restarts or traffic shifting. Mistakes can usually be undone very quickly, again by hot-loading. Systems tend to be much more loosely-coupled which makes it very easy to roll changes out incrementally.
What protocol is used in Whatsapp app? SSL socket to the WhatsApp server pools. All messages are queued on the server until the client reconnects to retrieve the messages. The successful retrieval of a message is sent back to the whatsapp server which forwards this status back to the original sender (which will see that as a "checkmark" icon next to the message). Messages are wiped from the server memory as soon as the client has accepted the message
How does the registration process work internally in Whatsapp? WhatsApp used to create a username/password based on the phone IMEI number. This was changed recently. WhatsApp now uses a general request from the app to send a unique 5 digit PIN. WhatsApp will then send a SMS to the indicated phone number (this means the WhatsApp client no longer needs to run on the same phone). Based on the pin number the app then request a unique key from WhatsApp. This key is used as "password" for all future calls. (this "permanent" key is stored on the device). This also means that registering a new device will invalidate the key on the old device.
WhatsApp has chosen Erlang a language built for writing scalable applications that are designed to withstand errors. Erlang uses an abstraction called the Actor model for it's concurrency - http://en.wikipedia.org/wiki/Actor_(programming_language) Instead of the more traditional shared memory approach, actors communicate by sending each other messages. Actors unlike threads are designed to be lightweight. Actors could be on the same machine or on different machines and the message passing abstractions works for both. A simple implementation of WhatsApp could be: Each user/device is represented as an actor. This actor is responsible for handling the inbox of the user, how it gets serialized to disk, the messages that the user sends and the messages that the user receives. Let's assume that Alice and Bob are friends on WhatsApp. So there is an an Alice actor and a Bob actor.
Let's trace a series of messages flowing back and forth:
Alice decides to message Bob. Alice's phone establishes a connection to the WhatsApp server and it is established that this connection is definitely from Alice's phone. Alice now sends via TCP the following message: "For Bob: A giant monster is attacking the Golden Gate Bridge". One of the WhatsApp front end server deserializes this message and delivers this message to the actor called Alice.
Alice the actor decides to serialize this and store it in a file called "Alice's Sent Messages", stored on a replicated file system to prevent data loss due to unpredictable monster rampage. Alice the actor then decides to forward this message to Bob the actor by passing it a message "Msg1 from Alice: A giant monster is attacking the Golden Gate Bridge". Alice the actor can retry with exponential back-off till Bob the actor acknowledges receiving the message.
Bob the actor eventually receives the message from (2) and decides to store this message in a file called "Bob's Inbox". Once it has stored this message durably Bob the actor will acknowledge receiving the message by sending Alice the actor a message of it's own saying "I received Msg1". Alice the actor can now stop it's retry efforts. Bob the actor then checks to see if Bob's phone has an active connection to the server. It does and so Bob the actor streams this message to the device via TCP.
Bob sees this message and replies with "For Alice: Let's create giant robots to fight them". This is now received by Bob the actor as outlined in Step 1. Bob the actor then repeats Step 2 and 3 to make sure Alice eventually receives the idea that will save mankind.
WhatsApp actually uses the XMPP protocol instead of the vastly superior protocol that I outlined above, but you get the point.
To my knowledge, Ejabberd (http://www.ejabberd.im/) is the parent, this is XMPP server which provide quite good features of open source, Whatsapp uses some modified version of this, facebook messaging also uses a modified version of this. Some more chat applications likes Samsung's ChatOn, Nimbuzz messenger all use ejabberd based ones and Erlang solutions also have modified version of this ejabberd which they claim to be highly scalable and well tested with more performance improvements and renamed as MongooseIM.
Ejabberd is the server which has most of the featured implemented when compared to other. Since it is build in Erlang it is highly scalable horizontally.
来源:https://stackoverflow.com/questions/19640703/what-is-the-technology-behind-wechat-whatsapp-and-other-messenger-apps