3

I'm building a messaging system, and trying to get a list of messages and their replies; and also join the participants of the message threads.

So I'm shooting for messages = [{id, subject, body, replies: [..], users: [..]}]

SELECT messages.*, 
 to_json(array_agg(users)) users, 
 to_json(array_agg(replies)) replies
FROM messages
LEFT JOIN (SELECT id, fullname, email, company FROM users) users 
 ON users.id = messages.user_id OR users.id = messages.to
LEFT JOIN (SELECT * FROM messages ORDER BY created_at) replies 
 ON replies.message_id = messages.id
WHERE messages.to = :to OR (messages.to IS NOT NULL AND messages.user_id = :to)
GROUP BY messages.id;

I've tried various other queries, this gets me the closest. The problem is that for each message, all the replies are repeated once. I.e., if there are 3 replies in a thread, I get 6 (2x each). Any ideas what might be wrong?

I am using Postgres 9.5.

András Váczi
31.8k13 gold badges103 silver badges152 bronze badges
asked Feb 15, 2016 at 3:06
0

1 Answer 1

3

The basic problem is this:

Here is one way to avoid this problem and improve some other details while being at it:

SELECT m.*, u.users, r.replies
FROM messages m
LEFT JOIN LATERAL (
 SELECT json_agg(u) AS users
 FROM (
 SELECT id, fullname, email, company
 FROM users
 WHERE id IN (m.user_id, m.to)
 -- no order by?
 ) u
 ) u ON TRUE
LEFT JOIN LATERAL (
 SELECT json_agg(r) AS replies
 FROM (
 SELECT *
 FROM messages
 WHERE message_id = m.id
 ORDER BY created_at
 ) r
 ) r ON TRUE
WHERE :to IN (m.user_id, m.to)
AND m.to IS NOT NULL;
  • Use json_agg() instead of to_json(array_agg()). Simpler, faster.

  • Aggregate the subquery on users right away in a LATERAL join, thus avoiding the main problem. Related:

  • This:

    WHERE m.to = :to OR
     (m.to IS NOT NULL AND m.user_id = :to)
    

    can be expressed simpler:

    WHERE :to IN (m.user_id, m.to)
    AND m.to IS NOT NULL
    
  • This:

    SELECT json_agg(r) AS replies
    FROM (
     SELECT *
     FROM messages
     WHERE message_id = m.id
     ORDER BY created_at
     ) r
    

    could be shortened to:

    SELECT json_agg(r ORDER BY created_at) AS replies
    FROM messages r
    WHERE message_id = m.id
    

But a single ORDER BY in a subquery is typically faster than per-aggregate ORDER BY.

answered Feb 15, 2016 at 8:57
1
  • Thanks a million, that's the ticket! Noted on including version, will do next time. Commented Feb 16, 2016 at 2:47

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.