I have created a messaging system for users, it allows them to send a message to another user. If it is the first time they have spoken then a new conversation is initiated,
I think you do not need to create a userconversation table.
If only user can have only one conversation with someone, the unique id for this thread is a concat between userId and friendId. So I move the friendId column in usersmessage table. The problem of order (friendId-userId is the same thread of userId-friendId) can be solved so:
SELECT CONCAT(GREATEST(userId,FriendId),"_",LEAST(userId,FriendId)) AS threadId
Now there is a problem of fetch the last message after a GROUP BY threadId.
I think is a good solution make a concat between DATE and message and after a MAX on this field.
I assume, for simplicity, column date is a DATETIME field ('YYYY-mm-dd H:i:s') but it not need because there is FROM_UNIXTIME function.
So the final query is
SELECT
CONCAT(GREATEST(userId,FriendId),"_",LEAST(userId,FriendId)) AS threadId,
friendId, MAX(date) AS last_date,
MAX(CONCAT(date,"|",message)) AS last_date_and_message
FROM usermessages
WHERE userId = :userId OR friendId = :userId
GROUP BY threadId ORDER BY last_date DESC
the result of field last_date_and_message is something like so:
2012-05-18 00:18:54|Hi my friend this is my last message
it can be simply parsed from your server side code.
Extending the answer suggested by Watcher.
You should consider dropping the "conversation" concept to simplify further.
+----+---------+------+------------------+--------+----------+
| id | message | read | time | toUser | fromUser |
+----+---------+------+------------------+--------+----------+
| 1 | test 1 | 0 | (some timestamp) | 3 | 4 |
| 2 | test 2 | 0 | (some timestamp) | 4 | 3 |
+----+---------+------+------------------+--------+----------+
List of all conversations for user 123:
SELECT * FROM (
SELECT id, message, toUser, fromUser
FROM userMessages
WHERE toUser = 123 OR fromUser = 123
ORDER BY id DESC
) AS internalTable
GROUP BY toUser, fromUser
List entire conversation between user 123 and user 456:
SELECT *
FROM userMessages
WHERE (toUser = 123 OR fromUser = 123)
AND (toUser = 456 OR fromUser = 456)
ORDER BY time DESC
hmm maybe i'm not understanding correctly your problem... but to me the solution is quite simple:
SELECT c.*, MAX(m.time) as latest_post
FROM conversations as c
INNER JOIN messages as m ON c.id = m.conversation_id
WHERE c.userId = 222 OR c.friendId = 222
GROUP BY c.id
ORDER BY latest_post DESC
here's my test data:
Conversations :
id userId friendId
1 222 333
2 222 444
Messages :
id message time (Desc) conversation_id
14 rty 2012-05-14 19:59:55 2
13 cvb 2012-05-14 19:59:51 1
12 dfg 2012-05-14 19:59:46 2
11 ert 2012-05-14 19:59:42 1
1 foo 2012-05-14 19:22:57 2
2 bar 2012-05-14 19:22:57 2
3 foo 2012-05-14 19:14:13 1
8 wer 2012-05-13 19:59:37 2
9 sdf 2012-05-13 19:59:24 1
10 xcv 2012-05-11 19:59:32 2
4 bar 2012-05-10 19:58:06 1
6 zxc 2012-05-08 19:59:17 2
5 asd 2012-05-08 19:58:56 1
7 qwe 2012-05-04 19:59:20 1
Query result :
id userId friendId latest_post
2 222 444 2012-05-14 19:59:55
1 222 333 2012-05-14 19:59:51
If that's not it... just ignore my answer :P
Hope this helps
Since a given pair of users can have at most one conversation, there is no need to "invent" separate key just to identify conversations. Also, the wording of your question seems to suggest that a message is always sent to a single user, so I'd probably go with something like this:
Now, there are several things to note about this model:
I1
is relatively expensive. There are ways to work around that, but the resulting complications are probably not worth it.With this data model, it becomes rather easy to sort the "conversations" (identified by user pairs) by the latest message. For example (replace 1
with desired user's USER_ID):
SELECT *
FROM (
SELECT USER1_ID, USER2_ID, MAX(SEND_TIME) NEWEST
FROM MESSAGE
WHERE (USER1_ID = 1 OR USER2_ID = 1)
GROUP BY USER1_ID, USER2_ID
) Q
ORDER BY NEWEST DESC;
(OR USER2_ID = 1
is the reason for the secondary index I1
.)
If you want not just latest times, but also latest messages, you can do something like this:
SELECT * FROM MESSAGE T1
WHERE
(USER1_ID = 1 OR USER2_ID = 1)
AND SEND_TIME = (
SELECT MAX(SEND_TIME)
FROM MESSAGE T2
WHERE
T1.USER1_ID = T2.USER1_ID
AND T1.USER2_ID = T2.USER2_ID
)
ORDER BY SEND_TIME DESC;
You can play with it in the SQL Fiddle.
1 If that's not the case, you can use monotonically-incrementing INT instead, but you'll have to SELECT MAX(...)
yourself since auto-increment doesn't work on PK subset; or simply make it PK alone and have secondary indexes on both USER1_ID and USER2_ID (fortunately, they would be slimmer since the PK is slimmer).
Why are you breaking up the data into conversations?
If it were me, I would use one table called 'usermessages' with the following format:
+----+--------+----------+-------------+------------+--------+
| id | userto | userfrom | timecreated | timeviewed | message|
+----+--------+----------+-------------+------------+--------+
A conversation is identified by the combination of the 'userto' and 'userfrom' columns. So, when you want to select all of a conversation:
SELECT * FROM usermessages
WHERE (userto = :userto OR userto = :userfrom)
AND (userfrom = :userfrom OR userfrom = :userto)
ORDER BY timecreated DESC
LIMIT 10
I would set it up like this
conversations (#id, last_message_id)
participation (#uid1, #uid2, conversation_id)
messages (#conversation_id, #id, uid, contents, read, *time)
conversations
This table will be used mainly to generate a new identifier for each conversation, together with a calculated field of the last update (for optimization). The two users have been disconnected from this table and moved into participation
.
participation
This table records the conversations between two users in both directions; to explain why, take a look at the following key:
ALTER TABLE `table` ADD PRIMARY(uid1, uid2);
While this is good for both enforcing the uniqueness and simple lookups, you should be aware of the following behavior:
SELECT * FROM table WHERE uid1=1 AND uid2=2
SELECT * FROM table WHERE uid1=1
SELECT * FROM table WHERE uid1=1 AND uid2>5
SELECT * FROM table WHERE uid2=2
The first two queries perform very well, MySQL also optimizes identity lookups on the first part of your key. The third one also yields pretty good performance as the second part of your key can be used for range queries. The last query doesn't perform well at all because the index is "left biased" and therefore it performs a full table scan.
messages
This table stores the actual sent messages, comprising the conversation identifier, sender id, contents, read flag and the time it was sent.
sending messages
To determine whether a conversation between two users has already been established you can simply query the participation
table:
SELECT conversation_id FROM participation WHERE uid1=:sender_id AND uid2=:receiver_id
If it does not yet exist, you create both records:
INSERT INTO conversations (last_message_id) VALUES (NULL);
# fetch last insert id here
INSERT INTO participation VALUES (:sender_id, :receiver_id, :conversation_id), (:receiver_id, :sender_id, :conversation_id);
INSERT INTO messages VALUES (:conversation_id, 0, :sender_id, :message_contents, 0, NOW());
UPDATE conversations SET last_message_id=LAST_INSERT_ID() WHERE id = :conversation_id
If the conversation is already setup: INSERT INTO messages VALUES (:conversation_id, 0, :sender_id, :message_contents, 0, NOW()); UPDATE conversations SET last_message_id=LAST_INSERT_ID() WHERE id = :conversation_id
Note: the UPDATE statement can be scheduled as LOW_PRIORITY because you don't always have to be 100% correct.
conversation overview
This has become a simpler query:
SELECT other_user.name, m.contents, m.read, c.id
FROM participation AS p
INNER JOIN user AS other_user ON other_user.id = p.uid2
INNER JOIN conversation AS c ON c.id = p.conversation_id
INNER JOIN messages AS m ON m.id = c.last_message_id
WHERE p.uid1 = :user_id
ORDER BY m.time DESC
LIMIT 50
Disclaimer: I have not tested this, but the write-up should make sense to you.
Another reason why it's good to have a two-way table is so that it's prepared for sharding, a method in which you push related data into another database (on a different machine); based on certain rules you would determine where to fetch the information from.
You could move the data in these ways:
participation
table up based on the uid1
fieldmessages
table up based on the conversation_id
fieldThe messages overview will get more complicated as you're likely being forced to make two queries; this can be mitigated somewhat with caches (and in extreme case document databases) though.
Hope this gives you some ideas on future planning :)