I have the situation where I have three tables:
hgn_modern_sources
+----+--------------+
| id | source |
+----+--------------+
| 17 | Something... |
+----+--------------+
junc_modern_source_has_reference
+----+------------------+------------------------+
| id | modern_source_id | location_within_source |
+----+------------------+------------------------+
| 52 | 17 | 1, 1 |
| 89 | 17 | 1, 1 |
| 99 | 17 | 1, 1 |
+----+------------------+------------------------+
hgn_picture_has_reference
+----+---------------------+-------------------+
| id | modern_reference_id | picture_link |
+----+---------------------+--------------------+
| 45 | 89 | /images/image1.png |
| 75 | 99 | /images/image2.png |
+----+---------------------+--------------------+
My question is how to delete duplicates from junc_modern_source_has_reference
(id = 89 and id = 99), leave one row with the lowest ID (id = 52), and replace foreign keys in hgn_picture_has_reference
table (column modern_reference_id
) with one left (id = 52)?
The solution can be only MySQL based, or PHP + MySQL (which I prefer more).
-
what version of MySQL do you use?NikitaSerbskiy– NikitaSerbskiy2020年04月15日 16:14:01 +00:00Commented Apr 15, 2020 at 16:14
-
create a dbfiddle.uk/?rdbms=mysql_8.0 or similar. It is a trivial task, but I don't feel like creating the test data and tables to verifyLennart - Slava Ukraini– Lennart - Slava Ukraini2020年04月15日 16:27:26 +00:00Commented Apr 15, 2020 at 16:27
-
I use MySQL 8.0.Boris J.– Boris J.2020年04月15日 16:47:55 +00:00Commented Apr 15, 2020 at 16:47
-
@Lennart I got some errors with that fiddle, it throw error "Run failed". I will try from my home PC.Boris J.– Boris J.2020年04月15日 16:57:35 +00:00Commented Apr 15, 2020 at 16:57
-
Yes, I noticed. I'll post an untested solution that should work even for earlier versions of MySQLLennart - Slava Ukraini– Lennart - Slava Ukraini2020年04月15日 17:12:11 +00:00Commented Apr 15, 2020 at 17:12
2 Answers 2
Definitions and sample data, either provide a fiddle or definitions and sample data as below, to get more attraction to your question:
create table junc_modern_source_has_reference
( id int not null primary key
, modern_source_id int not null
);
insert into junc_modern_source_has_reference
values (52,17),(89,17),(99,17);
create table hgn_picture_has_reference
( id int not null primary key
, modern_reference_id int not null
);
insert into hgn_picture_has_reference
values (45,89),(75,99);
First step is to update all rows to the lowest id:
update hgn_picture_has_reference x
set modern_reference_id = (select min(y.id)
from junc_modern_source_has_reference y
join junc_modern_source_has_reference z
on y.modern_source_id = z.modern_source_id
where x.modern_reference_id = z.id )
where exists (
select min(y.id)
from junc_modern_source_has_reference y
join junc_modern_source_has_reference z
on y.modern_source_id = z.modern_source_id
where x.modern_reference_id = z.id
);
Now we can delete all but the lowest id, this is non-standard SQL, but MySQL used to have problems referencing the table under modification:
delete x.*
from junc_modern_source_has_reference x
join junc_modern_source_has_reference y
on x.modern_source_id = y.modern_source_id
and y.id < x.id
;
A standard version would be something like:
delete x.*
from junc_modern_source_has_reference x
where exists (
select 1 from junc_modern_source_has_reference y
where x.modern_source_id = y.modern_source_id
and y.id < x.id
);
I.e. delete row if there is another row with lower id for the same modern_source_id
-
Thank you @Lennart for your answer and explanation. I understood it quite well and I will accomodate your solution to my case.Boris J.– Boris J.2020年04月16日 07:52:12 +00:00Commented Apr 16, 2020 at 7:52
DROP TEMPORARY TABLE IF EXISTS tempTable;
CREATE TEMPORARY TABLE tempTable
(id int
,minId int
PRIMARY KEY (id)
);
-- filling temporary table with list of duplicate ids
-- and corresponding minimal id within each group of duplicates
WITH CTE AS (
SELECT id
,MIN(id) OVER (PARTITION BY modern_source_id, location_within_source ) AS minId
,ROW_NUMBER OVER (PARTITION BY modern_source_id, location_within_source ORDER BY id) AS rn
FROM junc_modern_source_has_reference
)
INSERT INTO tempTable (id, minId)
SELECT id, minId FROM CTE WHERE rn > 1;
UPDATE hgn_picture_has_reference AS t1
JOIN tempTable AS t2 ON t1.id = t2.id
SET t1.modern_reference_id = t2.minId;
DELETE t1
FROM junc_modern_source_has_reference AS t1
JOIN tempTable AS t2 ON t1.id = t2.id;
DROP TEMPORARY TABLE tempTable;
If you use MySQL version older than 8.0 you can't use window functions and you need to modify part with CTE this way:
INSERT INTO tempTable (id, minId)
SELECT id, minId
FROM
(SELECT t1.id
,t2.minId
FROM junc_modern_source_has_reference AS t1
JOIN (
SELECT MIN(id) AS minId
,modern_source_id
,location_within_source
FROM junc_modern_source_has_reference
GROUP BY modern_source_id, location_within_source
HAVING COUNT(1) > 1
) AS t2 ON t1.modern_source_id = t2.modern_source_id
AND t1.location_within_source = t2.location_within_source
AND t1.id <> t2.minId
) AS CTE
-
Thank you for detailed answer. Can you explain me how this work? I never used PARTITION or CTE before.Boris J.– Boris J.2020年04月15日 17:09:53 +00:00Commented Apr 15, 2020 at 17:09
-
@BorisJ. CTE is just a common table expression (alias for subquery, you can use any name). "PARTITION BY" is a part of "OVER" clause. I used window fucntions to get minimal id within each group pointed in "PARTITION BY" and to get row number for each id within each group. If group contains more than one row then we want to delete all rows with row number greater than 1.NikitaSerbskiy– NikitaSerbskiy2020年04月15日 17:18:24 +00:00Commented Apr 15, 2020 at 17:18
-
Thank you for the explanation. Your answer is good, but Lennart's is much simpler for me to understand.Boris J.– Boris J.2020年04月16日 07:51:08 +00:00Commented Apr 16, 2020 at 7:51
-
@BorisJ.Lennart's answer is good too but I suppose it will make more reads and if you have a large tables it will affect the performance.NikitaSerbskiy– NikitaSerbskiy2020年04月16日 08:41:23 +00:00Commented Apr 16, 2020 at 8:41
-
Thank you for pointing that. That is very important. I will do some tests to see how it works.Boris J.– Boris J.2020年04月16日 19:48:43 +00:00Commented Apr 16, 2020 at 19:48
Explore related questions
See similar questions with these tags.