Could someone review an accepted answered I gave on Stack Overflow?
The use-case is as follows:
Given a messaging system where a user can receive a message from a single user and send messages to one or more users, return the most recent communication (sent or received) between a passed userId and the individual(s) that user communicated with.
For the example, I have three tables:
Users
id user_name 1 Walker 2 John 3 Kate
Messages
id senderid body time 1 1 ignored 1 2010年04月01日 00:00:00.000 2 1 ignored 2 2010年04月02日 00:00:00.000 3 3 ignored 3 2010年04月03日 00:00:00.000 4 1 msg A to john and kate 2010年04月10日 00:00:00.000 5 3 msg b from kate to walker and john 2010年04月11日 00:00:00.000
messages_recipients
id messageid userid 1 1 2 2 1 3 3 2 2 4 3 1 5 4 2 6 4 3 7 5 1 8 5 2
The data is tailored in such a way that I want a list of communications between user Walker and the people Walker has spoken with.
You can see a list of these messages by running the following SQL statement:
SELECT
u2.user_name AS Sender,
u1.user_name AS Receiver,
m.body,
m.time
FROM
messages m
JOIN
messages_recipients mr ON m.id = mr.messageid
JOIN
users u1 ON mr.userid = u1.id
JOIN
users u2 ON m.senderid = u2.id
ORDER BY
time DESC
Now that we have the test scenario, the part I want reviewed: returning the most recently communicated message between Walker, John, and Kate.
BEGIN
DECLARE @UserId INT = 1
--A. Main Query
SELECT
CASE
WHEN mtemp.senderid = 1 --@UserId
THEN
CONCAT('Message To: ', receivers.user_name)
ELSE
CONCAT('Message From: ' , senders.user_name)
END AS MessageType,
mtemp.body,
mtemp.time
FROM
messages mtemp
INNER JOIN users senders ON
mtemp.senderid = senders.id
INNER JOIN
(
--B. Inner Query determining most recent message (based on time)
-- between @UserID and the person @UserID
-- Communicated with (either as sender or receiver)
select userid,max(maxtime) as maxmaxtime from
(
--C.1. First part of Union Query Aggregating sent/received messages on passed @UserId
SELECT
m2.body,
kk.*
FROM
`messages` m2 INNER JOIN
(
SELECT DISTINCT
userid,
MAX(m.time) AS MaxTime
FROM
messages m INNER JOIN
messages_recipients mr ON m.id = mr.messageid AND
m.senderid = 1 --@UserId
GROUP BY
mr.userid
) kk on m2.time = kk.MaxTime and m2.senderid = 1 --@UserId
UNION
--C.2. Second part of Union Query Aggregating sent/received messages on passed @UserId
SELECT
m1.body,
jj.*
FROM
`messages` m1 INNER JOIN
----C.2a. Inner most query of users users who sent message to userid
(SELECT DISTINCT
senderid as userid,
MAX(m.time) AS MaxTime
FROM
messages m INNER JOIN
messages_recipients mr ON m.id = mr.messageid AND
mr.userid = 1 --@UserId
GROUP BY
m.senderid) jj on m1.time = jj.MaxTime and m1.senderid = jj.userid
) MaximumUserTime
group by
MaximumUserTime.userid
) AggregatedData on mtemp.time = AggregatedData.maxmaxtime
INNER JOIN users receivers on AggregatedData.userid = receivers.id
ORDER BY `time` DESC
END
To test in phpMyAdmin
, you'll have to remove the comments and the BEGIN
/END
DECLARE
statements as well. I just wanted to post this as if it would look in a procedure.
When I run this query I get the following results:
MessageType body time Message From: Kate msg b from kate to walker and john 2010年04月11日 00:00:00.000 Message To: John msg A to john and kate 2010年04月10日 00:00:00.000
That's the most recent communications concerning Walker among all those users who have communicated with Walker.
Is there a better way to run this query?
1 Answer 1
My solution has a similar complexity to yours (14 steps in EXPLAIN
), assuming MySQL's query optimizer is smart enough. However, in my opinion, this formulation will be much easier to understand.
SELECT IF(recipientid,
CONCAT('Message To: ', recipient.user_name),
CONCAT('Message From: ', sender.user_name)) AS MessageType,
body,
time
FROM
( -- Join messages with recipients, relabeling userids in terms of interlocutor and self
SELECT messageid, time, body, NULL AS senderid, userid AS recipientid, userid AS interlocutor, senderid AS self
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
UNION
SELECT messages.id, time, body, senderid, NULL, senderid, userid
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
) AS thread_latest
LEFT OUTER JOIN users AS recipient
ON recipient.id = recipientid
LEFT OUTER JOIN users AS sender
ON sender.id = senderid
WHERE
-- Discard all but the latest message in each thread
NOT EXISTS (
SELECT messageid
FROM
(
SELECT messageid, time, userid AS interlocutor, senderid AS self
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
UNION
SELECT messages.id, time, senderid, userid
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
) AS thread_later
WHERE
thread_later.self = thread_latest.self AND
thread_later.interlocutor = thread_latest.interlocutor AND
thread_later.time > thread_latest.time
) AND
self = 1 --@UserId
ORDER BY time DESC;
The main insight is that once you relabel senders and recipients in terms of interlocutor and self, it's just a simple matter of filtering out the results. Retain only those messages where self
is the user in question. Then, every row that has the same interlocutor
conceptually constitutes a thread.
Notice that there is a subquery that appears twice. We can make it clearer by creating a view.
CREATE VIEW threads AS
-- Messages I sent
SELECT messageid, time, body, NULL AS senderid, userid AS recipientid, userid AS interlocutor, senderid AS self
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
UNION
-- Messages I received
SELECT messages.id, time, body, senderid, NULL, senderid, userid
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid;
SELECT IF(recipientid,
CONCAT('Message To: ', recipient.user_name),
CONCAT('Message From: ', sender.user_name)) AS MessageType,
body,
time
FROM
threads AS thread_latest
LEFT OUTER JOIN users AS recipient
ON recipient.id = recipientid
LEFT OUTER JOIN users AS sender
ON sender.id = senderid
WHERE
NOT EXISTS (
SELECT messageid
FROM threads AS thread_later
WHERE
thread_later.self = thread_latest.self AND
thread_later.interlocutor = thread_latest.interlocutor AND
thread_later.time > thread_latest.time
) AND
self = 1 --@UserId
ORDER BY time DESC;
I'll take this opportunity to point out that this query is where PostgreSQL really shines. Two features in PostgreSQL (since version 8.4) make it easy. The WITH
clause lets you define a helper view in the query itself. More importantly,
window functions let you partition the threads by interlocutor, which is precisely the tricky part about this problem.
WITH threads(messageid, time, body, senderid, recipientid, interlocutor, self) AS (
-- Messages I sent
SELECT messageid, time, body, NULL, userid, userid, senderid
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
UNION
-- Messages I received
SELECT messages.id, time, body, senderid, NULL, senderid, userid
FROM messages_recipients
INNER JOIN messages
ON messages.id = messageid
)
SELECT CASE WHEN recipientid IS NOT NULL
THEN 'Message To: ' || recipient.user_name
ELSE 'Message From: ' || sender.user_name
END AS MessageType,
body,
time
FROM (
SELECT *,
RANK() OVER (PARTITION BY interlocutor ORDER BY time DESC) AS thread_pos
FROM threads
WHERE self = 1 --@UserId
) AS my_threads
LEFT OUTER JOIN users AS recipient
ON recipient.id = recipientid
LEFT OUTER JOIN users AS sender
ON sender.id = senderid
WHERE thread_pos = 1 -- Only the latest message per thread
ORDER BY time DESC;
-
\$\begingroup\$ Thanks for taking the time to give a detailed answer to this 2+ year old question. \$\endgroup\$ray023– ray0232013年08月16日 15:32:16 +00:00Commented Aug 16, 2013 at 15:32