5

I have two tables complaints and complaints_reply in my MySQl database. Users can add complaints which are stored in complaints the complaints reply are stored in complaints_reply table. I am trying to JOIN both these table contents on a specific condition. Before I mention what I am trying to get and the problem I faced, I will explain the structure of these two tables first.


NB: The person who adds complaints is complaint owner & person who adds a complaint reply is complaint replier. Complaint owner can also add replies. So he can either be the complaint owner or the complaint replier. The two tables have a one-to-many relationship. A complaint can have more than one complaint reply. member_id in complaint table represents complaint owner & mem_id in complaints_reply represent complaint replier


DESIRED OUTPUT:

Join the two tables and fetch values and show the complaint and complaint’s reply as a single result set. But the condition is kinda tricky. The last added complaint reply from the complaints_reply table should be fetched for the complaint in complaints table in such a way that the complaint owner should not be the complaint replier. I use posted_date & posted_time from complaints_reply table to fetch the last added complaint reply for a complaint & that complaint replier has to be shown in the result set.

So, from the sample data the tables contain now, the output that I should get is:

+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |2002 | 2015年05月26日10:11:17|
| 2 | y | 1001 |1000 | 2015年05月27日10:06:16|
+------+---------+----------+-------------+-------------------+

But what I got is:

+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |1001 | 2015年05月26日10:11:17|
| 2 | y | 1001 |2000 | 2015年05月27日10:06:16|
+------+---------+----------+-------------+-------------------+

The date is correct, but the returned complaint replier last_replier is wrong.

This is my query.

SELECT com.id,
 com.title,
 com.member_id,
 last_comp_reply.last_replier,
 last_comp_reply.last_posted_dt
FROM complaints com
LEFT JOIN
 (SELECT c.id AS complaint_id,
 c.member_id AS parent_mem_id,
 cr.mem_id AS last_replier,
 max(cr.posted_dt) AS last_posted_dt
 FROM
 (SELECT cr.complaint_id,cr.mem_id,c.id,c.member_id,(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
 FROM complaints_reply cr,
 complaints c
 WHERE cr.complaint_id=c.id
 AND cr.mem_id!=c.member_id
 GROUP BY cr.complaint_id,
 cr.mem_id,
 posted_dt)cr,
 complaints c
 WHERE cr.complaint_id=c.id
 GROUP BY cr.complaint_id,
 c.id,
 c.member_id) AS last_comp_reply ON com.id=last_comp_reply.complaint_id

Table structure for table complaints

CREATE TABLE IF NOT EXISTS `complaints` (
 `id` int(11) NOT NULL,
 `title` varchar(500) NOT NULL,
 `member_id` int(11) NOT NULL,
 `posted_date` date NOT NULL,
 `posted_time` time NOT NULL 
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

Indexes for table complaints

ALTER TABLE `complaints`
 ADD PRIMARY KEY (`id`);

AUTO_INCREMENT for table complaints

ALTER TABLE `complaints`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=3;

Dumping data for table complaints

INSERT INTO `complaints` (`id`, `title`, `member_id`, `posted_date`, `posted_time`) VALUES
(1, 'x', 1000, '2015-05-05', '02:06:15'),
(2, 'y', 1001, '2015-05-14', '02:08:10');

Table structure for table complaints_reply

CREATE TABLE IF NOT EXISTS `complaints_reply` (
`id` int(11) NOT NULL,
 `complaint_id` int(11) NOT NULL,
 `comments` text NOT NULL,
 `mem_id` int(11) NOT NULL,
 `posted_date` date NOT NULL,
 `posted_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;

Indexes for table complaints_reply

ALTER TABLE `complaints_reply`
 ADD PRIMARY KEY (`id`);

AUTO_INCREMENT for table complaints_reply

ALTER TABLE `complaints_reply`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=10;

Dumping data for table complaints_reply

INSERT INTO `complaints_reply` (`id`, `complaint_id`, `comments`, `mem_id`, `posted_date`, `posted_time`) VALUES
(1, 1, 'reply1', 2000, '2015-05-08', '02:07:08'),
(2, 1, 'reply2', 2001, '2015-05-06', '06:05:08'),
(3, 1, 'reply3', 1000, '2015-05-14', '02:12:13'),
(4, 2, 'hola', 1000, '2015-05-27', '10:06:16'),
(5, 2, 'hello', 2000, '2015-05-04', '03:09:09'),
(6, 2, 'gracias', 1001, '2015-05-31', '06:12:18'),
(7, 1, 'reply4', 1001, '2015-01-04', '04:08:12'),
(8, 2, 'puta', 1001, '2015-06-13', '06:12:18'),
(9, 1, 'reply5', 1000, '2015-06-01', '04:08:12'),
(10, 1, 'reply next', 2002, '2015-05-26', '10:11:17');
P.S.

To give an idea about what my query is all about, I'll explain the sub query that is used to combine the tables & give result based on the condition: complaint owner should not be the complaint replier is:

SELECT cr.complaint_id,
 cr.mem_id,
 c.id,
 c.member_id,
 (CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
 complaints c
WHERE cr.complaint_id=c.id
 AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
 cr.mem_id,
 posted_dt

And the result for this is:

+--------------+---------+----------+-------------+-------------------+
| complaint_id | mem_id | id |member_id | posted_dt |
+--------------+---------+------- +-------------+-------------------+
| 1 | 1001 | 1 |1000 | 2015年01月04日04:08:12|
| 1 | 2000 | 1 |1000 | 2015年05月08日02:07:08|
| 1 | 2001 | 1 |1000 | 2015年05月06日06:05:08|
| 1 | 2002 | 1 |1000 | 2015年05月26日10:11:17|
| 2 | 1000 | 2 |1001 | 2015年05月27日10:06:16|
| 2 | 2000 | 2 |1001 | 2015年05月04日03:09:09|
+--------------+---------+----------+-------------+-------------------+

member_id here represents complaint owner and mem_id represents complaint replier

The inner query gives the result based on the condition, then everything after this goes haywire. I don't know where I made mistake. The complaint replies added by complaint owner is not fetched in this table. So far so good. Is there any alternative way to get the result from here?

asked May 10, 2015 at 20:19

3 Answers 3

5

The problem is the subquery, where you combine in the select list both an aggregated (max(cr.posted_dt)) and non-aggregated expressions/columns from the tables.

MySQL allows you to run this kind of inconsistent queries - with default settings - which basically means it depends on the developer to write consistent queries and not fire themselves on the foot. You can change the sql mode to ONLY_FULL_GROUP_BY and see what happens if you try to run your query.

Now to solve the issue, it seems you want a [greatest-n-per-group] type of query. There are several ways to do this, all quite complicated in MySQL (because it lacks window functions). Check the relevant tags in this site and in the SO main site.

Here's one way:

SELECT c.id,
 c.title,
 c.member_id,
 cr.mem_id AS last_replier,
 CONCAT(cr.posted_date, 'T', cr.posted_time) AS last_posted_dt
FROM complaints AS c
 LEFT JOIN complaints_reply AS cr
 ON cr.id = 
 ( SELECT crl.id
 FROM complaints_reply AS crl
 WHERE crl.complaint_id = c.id
 AND crl.mem_id <> c.member_id
 ORDER BY posted_date DESC, posted_time DESC
 LIMIT 1
 ) ;

It will give you consistent results and also be quite efficient, if you add an index on (complaint_id, posted_date, posted_time, mem_id)

Tested at SQLfiddle.

answered May 10, 2015 at 21:24
4
  • 1
    Just as a matter of interest, how would you do it in a (decent) database that had window functions? +1 for a nice answer, and +1 to the OP for a very thorough and clear question. Commented May 10, 2015 at 21:33
  • 1
    @Vérace thnx. It's usually done using ROW_NUMBER() (i.e. ROW_NUMBER() OVER (PARTITION BY complaint_id ORDER BY ...) AS rn in a subquery and then WHERE rn = 1 in the external query). In SQL-Server you could also use OUTER APPLY. In fact my query is very similar to OUTER APPLY. In Postgres, one could also use DISTINCT ON for more compact syntax or in 9.3+ versions a LATERAL join (i.e. a correlated join) which is usually even better for performance. Commented May 10, 2015 at 21:38
  • 3
    mysql really needs to up its game in this regard. Commented May 10, 2015 at 23:56
  • Thanks for saving my ass. I will look into the code. @ypercube Commented May 11, 2015 at 12:08
1

This is tricky because MySQL doesn't care about the data consistency in regards with GROUP-BY clause, but it cares only about the grouping data, so after a group-by we can trust only the group-by columns, but not the data associated.
As an example, consider table (id, name, sex, age):

+----+-------+-----+-----+
| id | name | sex | age |
+----+-------+-----+-----+
| 1 | Minel | 1 | 32 |
| 2 | Ginel | 1 | 34 |
| 3 | Sinel | 1 | 42 |
| 4 | Ana | 2 | 29 |
| 5 | Mara | 2 | 36 |
| 6 | Tara | 2 | 39 |
+----+-------+-----+-----+

and we want to retrieve the oldest person for each sex.
The query

select id,Nume,Sex, MAX(Varsta)
from tabel
group by Sex;

returns

+----+-------+-----+-----+
| id | name | sex | age |
+----+-------+-----+-----+
| 1 | Minel | 1 | 42 |
| 4 | Ana | 2 | 39 |

which is wrong, because "Minel" is only 32 years old!
Actually, MySQL retrieves the FIRST record matching the group-by clause, so we can't use it as a safe filtering clause.
Therefore the correct query is:

select x.*
from table x, (select Sex, max(age) Max from table yy group by Sex) y
where x.Sex=y.Sex and x.age=y.age;

which looks fine, but it have limits: how about more persons with the same highest age? We might want them in the result too, right?
Also, taking into consideration that MySQL retrieves the first record matching the group-by clause, we can (as a trick!) first order the dataset:

select x.*
from (select * from table order by sex, age desc) x
group by sex;

with the correct result.
Of course, some might find more fun working with session variables:

SELECT @rn := CASE WHEN @prior <> sex THEN 1 ELSE @rn+1 END AS rn,
 @prior :=sex rd,
 id,name,age,sex
FROM table,(SELECT @rn := 0) r,(SELECT @prior := 0) n
HAVING rn=1
ORDER BY sex,age DESC;

which is great, but not properly understood by most MySQL users.

For the OP's situation, in regards with the above, a simple solution might be this:

SELECT c.id,c.title,c.member_id, mem_id as last_replier, 
 MAX(concat(r.posted_date,' ',r.posted_time)) as last_posted_dt
FROM
 (select * 
 from complaints_reply 
 order by complaint_id, concat(posted_date,' ',posted_time) desc
 ) as r
 JOIN complaints c ON c.id=r.complaint_id and c.member_id <> r.mem_id
group by complaint_id;

with the result:

 +----+-------+-----------+--------------+---------------------+
 | id | title | member_id | last_replier | last_posted_dt |
 +----+-------+-----------+--------------+---------------------+
 | 1 | x | 1000 | 2002 | 2015年05月26日 10:11:17 |
 | 2 | y | 1001 | 1000 | 2015年05月27日 10:06:16 |
 +----+-------+-----------+--------------+---------------------+

So always separate the grouping clause from filtering clause to get safe results.

LE: I agree with ypercube about indexes. Also note that his answer have better use of indexed columns. It is best to test either solutions on a large dataset.

answered May 11, 2015 at 9:04
8
  • 1
    The "first ORDER BY, then GROUP BY" method does not work in all MySQL versions/variants (one example being MariaDB, in versions 5.3+) and is not guaranteed to work in future versions (It is not guaranteed to work in any version, for all I know). Commented May 11, 2015 at 9:09
  • 1
    And the last query needs fixing. It doesn't return the expected: sqlfiddle.com/#!2/77063/6 Commented May 11, 2015 at 9:13
  • The issue was tested in Oracle's MySQL 5.1 and MySQL 5.5 and, while I totally agree with being carefully with different variants of MySQL, it is hard to believe that MySQL will pull back support for ordering or grouping in sub-queries. At least, the OP only mention that is about MySQL. Commented May 11, 2015 at 9:17
  • Corrected the final query, it was messed up while formating. Commented May 11, 2015 at 9:20
  • Order by (without LIMIT) in a subquery is totally useless and eliminating it (during optimization, not forbidding it) is one of the possible query transformation than an optimizer can have. That's what MariaDB does (and many other DBMS). I'd say it's probable it will be added in some future version of the mainstream MySQL. Commented May 11, 2015 at 9:22
1

@ypercube YperCube's answer did the job. Easy. Effective. A special thanks to him.

Anyway, after some research I have came up with an answer of my own. This is an alternate way, Just thought of sharing it.

SELECT com.id AS complaint_id,
 com.member_id AS parent_mem_id,
 crep.mem_id AS last_replier,
 crl.last_posted_dt
FROM complaints com
LEFT JOIN complaints_reply crep 
 ON com.id = crep.complaint_id
JOIN
 (SELECT cr.complaint_id,
 max(CONCAT(cr.posted_date,'_',cr.posted_time)) AS last_posted_dt
 FROM complaints_reply cr,
 complaints c
 WHERE cr.complaint_id = c.id
 AND cr.mem_id != c.member_id
 GROUP BY cr.complaint_id
 ) crl 
 ON CONCAT(crep.posted_date,'_',crep.posted_time) = crl.last_posted_dt
 AND crep.complaint_id = crl.complaint_id ;
ypercubeTM
99.7k13 gold badges217 silver badges306 bronze badges
answered May 11, 2015 at 17:26

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.