I have two tables complaints
and complaints_reply
in my MySQl
database. Users can add complaints which are stored in complaints
the complaints reply are stored in complaints_reply
table. I am trying to JOIN both these table contents on a specific condition. Before I mention what I am trying to get and the problem I faced, I will explain the structure of these two tables first.
NB: The person who adds complaints is complaint owner & person who adds a complaint reply is complaint replier. Complaint owner can also add replies. So he can either be the complaint owner or the complaint replier. The two tables have a one-to-many relationship. A complaint can have more than one complaint reply. member_id
in complaint
table represents complaint owner & mem_id
in complaints_reply
represent complaint replier
DESIRED OUTPUT:
Join the two tables and fetch values and show the complaint and complaint’s reply as a single result set. But the condition is kinda tricky. The last added complaint reply from the complaints_reply
table should be fetched for the complaint in complaints
table in such a way that the complaint owner should not be the complaint replier. I use posted_date
& posted_time
from complaints_reply
table to fetch the last added complaint reply for a complaint & that complaint replier has to be shown in the result set.
So, from the sample data the tables contain now, the output that I should get is:
+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |2002 | 2015年05月26日10:11:17|
| 2 | y | 1001 |1000 | 2015年05月27日10:06:16|
+------+---------+----------+-------------+-------------------+
But what I got is:
+------+---------+----------+-------------+-------------------+
| id | title |member_id |last_replier |last_posted_dt |
+------+---------+----------+-------------+-------------------+
| 1 | x | 1000 |1001 | 2015年05月26日10:11:17|
| 2 | y | 1001 |2000 | 2015年05月27日10:06:16|
+------+---------+----------+-------------+-------------------+
The date is correct, but the returned complaint replier last_replier
is wrong.
This is my query.
SELECT com.id,
com.title,
com.member_id,
last_comp_reply.last_replier,
last_comp_reply.last_posted_dt
FROM complaints com
LEFT JOIN
(SELECT c.id AS complaint_id,
c.member_id AS parent_mem_id,
cr.mem_id AS last_replier,
max(cr.posted_dt) AS last_posted_dt
FROM
(SELECT cr.complaint_id,cr.mem_id,c.id,c.member_id,(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id=c.id
AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
cr.mem_id,
posted_dt)cr,
complaints c
WHERE cr.complaint_id=c.id
GROUP BY cr.complaint_id,
c.id,
c.member_id) AS last_comp_reply ON com.id=last_comp_reply.complaint_id
Table structure for table complaints
CREATE TABLE IF NOT EXISTS `complaints` (
`id` int(11) NOT NULL,
`title` varchar(500) NOT NULL,
`member_id` int(11) NOT NULL,
`posted_date` date NOT NULL,
`posted_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
Indexes for table complaints
ALTER TABLE `complaints`
ADD PRIMARY KEY (`id`);
AUTO_INCREMENT for table complaints
ALTER TABLE `complaints`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=3;
Dumping data for table complaints
INSERT INTO `complaints` (`id`, `title`, `member_id`, `posted_date`, `posted_time`) VALUES
(1, 'x', 1000, '2015-05-05', '02:06:15'),
(2, 'y', 1001, '2015-05-14', '02:08:10');
Table structure for table complaints_reply
CREATE TABLE IF NOT EXISTS `complaints_reply` (
`id` int(11) NOT NULL,
`complaint_id` int(11) NOT NULL,
`comments` text NOT NULL,
`mem_id` int(11) NOT NULL,
`posted_date` date NOT NULL,
`posted_time` time NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10 ;
Indexes for table complaints_reply
ALTER TABLE `complaints_reply`
ADD PRIMARY KEY (`id`);
AUTO_INCREMENT for table complaints_reply
ALTER TABLE `complaints_reply`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=10;
Dumping data for table complaints_reply
INSERT INTO `complaints_reply` (`id`, `complaint_id`, `comments`, `mem_id`, `posted_date`, `posted_time`) VALUES
(1, 1, 'reply1', 2000, '2015-05-08', '02:07:08'),
(2, 1, 'reply2', 2001, '2015-05-06', '06:05:08'),
(3, 1, 'reply3', 1000, '2015-05-14', '02:12:13'),
(4, 2, 'hola', 1000, '2015-05-27', '10:06:16'),
(5, 2, 'hello', 2000, '2015-05-04', '03:09:09'),
(6, 2, 'gracias', 1001, '2015-05-31', '06:12:18'),
(7, 1, 'reply4', 1001, '2015-01-04', '04:08:12'),
(8, 2, 'puta', 1001, '2015-06-13', '06:12:18'),
(9, 1, 'reply5', 1000, '2015-06-01', '04:08:12'),
(10, 1, 'reply next', 2002, '2015-05-26', '10:11:17');
P.S.
To give an idea about what my query is all about, I'll explain the sub query that is used to combine the tables & give result based on the condition: complaint owner should not be the complaint replier is:
SELECT cr.complaint_id,
cr.mem_id,
c.id,
c.member_id,
(CONCAT(cr.posted_date,cr.posted_time)) AS posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id=c.id
AND cr.mem_id!=c.member_id
GROUP BY cr.complaint_id,
cr.mem_id,
posted_dt
And the result for this is:
+--------------+---------+----------+-------------+-------------------+
| complaint_id | mem_id | id |member_id | posted_dt |
+--------------+---------+------- +-------------+-------------------+
| 1 | 1001 | 1 |1000 | 2015年01月04日04:08:12|
| 1 | 2000 | 1 |1000 | 2015年05月08日02:07:08|
| 1 | 2001 | 1 |1000 | 2015年05月06日06:05:08|
| 1 | 2002 | 1 |1000 | 2015年05月26日10:11:17|
| 2 | 1000 | 2 |1001 | 2015年05月27日10:06:16|
| 2 | 2000 | 2 |1001 | 2015年05月04日03:09:09|
+--------------+---------+----------+-------------+-------------------+
member_id
here represents complaint owner and mem_id
represents complaint replier
The inner query gives the result based on the condition, then everything after this goes haywire. I don't know where I made mistake. The complaint replies added by complaint owner is not fetched in this table. So far so good. Is there any alternative way to get the result from here?
3 Answers 3
The problem is the subquery, where you combine in the select list both an aggregated (max(cr.posted_dt)
) and non-aggregated expressions/columns from the tables.
MySQL allows you to run this kind of inconsistent queries - with default settings - which basically means it depends on the developer to write consistent queries and not fire themselves on the foot. You can change the sql mode to ONLY_FULL_GROUP_BY
and see what happens if you try to run your query.
Now to solve the issue, it seems you want a [greatest-n-per-group]
type of query. There are several ways to do this, all quite complicated in MySQL (because it lacks window functions). Check the relevant tags in this site and in the SO main site.
Here's one way:
SELECT c.id,
c.title,
c.member_id,
cr.mem_id AS last_replier,
CONCAT(cr.posted_date, 'T', cr.posted_time) AS last_posted_dt
FROM complaints AS c
LEFT JOIN complaints_reply AS cr
ON cr.id =
( SELECT crl.id
FROM complaints_reply AS crl
WHERE crl.complaint_id = c.id
AND crl.mem_id <> c.member_id
ORDER BY posted_date DESC, posted_time DESC
LIMIT 1
) ;
It will give you consistent results and also be quite efficient, if you add an index on (complaint_id, posted_date, posted_time, mem_id)
Tested at SQLfiddle.
-
1Just as a matter of interest, how would you do it in a (decent) database that had window functions? +1 for a nice answer, and +1 to the OP for a very thorough and clear question.Vérace– Vérace2015年05月10日 21:33:58 +00:00Commented May 10, 2015 at 21:33
-
1@Vérace thnx. It's usually done using
ROW_NUMBER()
(i.e.ROW_NUMBER() OVER (PARTITION BY complaint_id ORDER BY ...) AS rn
in a subquery and thenWHERE rn = 1
in the external query). In SQL-Server you could also useOUTER APPLY
. In fact my query is very similar to OUTER APPLY. In Postgres, one could also useDISTINCT ON
for more compact syntax or in 9.3+ versions aLATERAL
join (i.e. a correlated join) which is usually even better for performance.ypercubeᵀᴹ– ypercubeᵀᴹ2015年05月10日 21:38:56 +00:00Commented May 10, 2015 at 21:38 -
3mysql really needs to up its game in this regard.Philᵀᴹ– Philᵀᴹ2015年05月10日 23:56:46 +00:00Commented May 10, 2015 at 23:56
-
Thanks for saving my ass. I will look into the code. @ypercubeSajeev C– Sajeev C2015年05月11日 12:08:04 +00:00Commented May 11, 2015 at 12:08
This is tricky because MySQL doesn't care about the data consistency in regards with GROUP-BY clause, but it cares only about the grouping data, so after a group-by we can trust only the group-by columns, but not the data associated.
As an example, consider table
(id, name, sex, age):
+----+-------+-----+-----+
| id | name | sex | age |
+----+-------+-----+-----+
| 1 | Minel | 1 | 32 |
| 2 | Ginel | 1 | 34 |
| 3 | Sinel | 1 | 42 |
| 4 | Ana | 2 | 29 |
| 5 | Mara | 2 | 36 |
| 6 | Tara | 2 | 39 |
+----+-------+-----+-----+
and we want to retrieve the oldest person for each sex.
The query
select id,Nume,Sex, MAX(Varsta)
from tabel
group by Sex;
returns
+----+-------+-----+-----+
| id | name | sex | age |
+----+-------+-----+-----+
| 1 | Minel | 1 | 42 |
| 4 | Ana | 2 | 39 |
which is wrong, because "Minel" is only 32 years old!
Actually, MySQL retrieves the FIRST record matching the group-by clause, so we can't use it as a safe filtering clause.
Therefore the correct query is:
select x.*
from table x, (select Sex, max(age) Max from table yy group by Sex) y
where x.Sex=y.Sex and x.age=y.age;
which looks fine, but it have limits: how about more persons with the same highest age? We might want them in the result too, right?
Also, taking into consideration that MySQL retrieves the first record matching the group-by clause, we can (as a trick!) first order the dataset:
select x.*
from (select * from table order by sex, age desc) x
group by sex;
with the correct result.
Of course, some might find more fun working with session variables:
SELECT @rn := CASE WHEN @prior <> sex THEN 1 ELSE @rn+1 END AS rn,
@prior :=sex rd,
id,name,age,sex
FROM table,(SELECT @rn := 0) r,(SELECT @prior := 0) n
HAVING rn=1
ORDER BY sex,age DESC;
which is great, but not properly understood by most MySQL users.
For the OP's situation, in regards with the above, a simple solution might be this:
SELECT c.id,c.title,c.member_id, mem_id as last_replier,
MAX(concat(r.posted_date,' ',r.posted_time)) as last_posted_dt
FROM
(select *
from complaints_reply
order by complaint_id, concat(posted_date,' ',posted_time) desc
) as r
JOIN complaints c ON c.id=r.complaint_id and c.member_id <> r.mem_id
group by complaint_id;
with the result:
+----+-------+-----------+--------------+---------------------+
| id | title | member_id | last_replier | last_posted_dt |
+----+-------+-----------+--------------+---------------------+
| 1 | x | 1000 | 2002 | 2015年05月26日 10:11:17 |
| 2 | y | 1001 | 1000 | 2015年05月27日 10:06:16 |
+----+-------+-----------+--------------+---------------------+
So always separate the grouping clause from filtering clause to get safe results.
LE: I agree with ypercube about indexes. Also note that his answer have better use of indexed columns. It is best to test either solutions on a large dataset.
-
1The "first
ORDER BY
, thenGROUP BY
" method does not work in all MySQL versions/variants (one example being MariaDB, in versions 5.3+) and is not guaranteed to work in future versions (It is not guaranteed to work in any version, for all I know).ypercubeᵀᴹ– ypercubeᵀᴹ2015年05月11日 09:09:59 +00:00Commented May 11, 2015 at 9:09 -
1And the last query needs fixing. It doesn't return the expected: sqlfiddle.com/#!2/77063/6ypercubeᵀᴹ– ypercubeᵀᴹ2015年05月11日 09:13:09 +00:00Commented May 11, 2015 at 9:13
-
The issue was tested in Oracle's MySQL 5.1 and MySQL 5.5 and, while I totally agree with being carefully with different variants of MySQL, it is hard to believe that MySQL will pull back support for ordering or grouping in sub-queries. At least, the OP only mention that is about MySQL.Tinel Barb– Tinel Barb2015年05月11日 09:17:18 +00:00Commented May 11, 2015 at 9:17
-
Corrected the final query, it was messed up while formating.Tinel Barb– Tinel Barb2015年05月11日 09:20:58 +00:00Commented May 11, 2015 at 9:20
-
Order by
(withoutLIMIT
) in a subquery is totally useless and eliminating it (during optimization, not forbidding it) is one of the possible query transformation than an optimizer can have. That's what MariaDB does (and many other DBMS). I'd say it's probable it will be added in some future version of the mainstream MySQL.ypercubeᵀᴹ– ypercubeᵀᴹ2015年05月11日 09:22:02 +00:00Commented May 11, 2015 at 9:22
@ypercube YperCube's answer did the job. Easy. Effective. A special thanks to him.
Anyway, after some research I have came up with an answer of my own. This is an alternate way, Just thought of sharing it.
SELECT com.id AS complaint_id,
com.member_id AS parent_mem_id,
crep.mem_id AS last_replier,
crl.last_posted_dt
FROM complaints com
LEFT JOIN complaints_reply crep
ON com.id = crep.complaint_id
JOIN
(SELECT cr.complaint_id,
max(CONCAT(cr.posted_date,'_',cr.posted_time)) AS last_posted_dt
FROM complaints_reply cr,
complaints c
WHERE cr.complaint_id = c.id
AND cr.mem_id != c.member_id
GROUP BY cr.complaint_id
) crl
ON CONCAT(crep.posted_date,'_',crep.posted_time) = crl.last_posted_dt
AND crep.complaint_id = crl.complaint_id ;