1

I've got some duplicated rows on a table called ja_jobs:

To find those duplicated data, I'm running this Query:

select * from ja_jobs WHERE clientid = 33731 AND creatortype = 'legacyrec' AND deleted = false AND time_job IS NOT NULL AND (time_job,recurrenceid) IN (
select time_job,recurrenceid FROM ja_jobs WHERE clientid = 33731 GROUP BY time_job,recurrenceid HAVING count(*) > 1
)

The query finds duplicated rows by time_job and recurrenceid.

On the following example:

enter image description here

You can see that the jobs are duplicated, but we have three versions of it (Just look on the modified_date column)

I need to delete the new jobs and only keep the OLDEST one.

DELETE from ja_jobs WHERE id IN (14754912,14792799);

How can I do that? How can I select all the newest jobs and just delete them?

Here what I've got so far:

select min(id) over (partition by time_job,recurrenceid,time_arrival order by created_date) as min_id into junk.test_table FROM ja_jobs
WHERE clientid = 33731 AND creatortype = 'legacyrec' AND deleted = false AND (time_job,recurrenceid) IN (
select time_job,recurrenceid FROM ja_jobs WHERE clientid = 33731 GROUP BY time_job,recurrenceid HAVING count(*) > 1
)

But on the junk.test_table table I got duplicated "min_id"

enter image description here

asked Apr 20, 2016 at 2:18
0

1 Answer 1

0

You're mixing grouping criteria twice while creating junk_test, first in the GROUP BY subselect by having fewer conditions in the WHERE, and then in the PARTITION BY by having one extra partitioning field (time arrival).

If you can assume that older ids are older jobs, then you can identify your duplicates by joining grouped table with itself, like this:

SELECT jd.dup_group_no, j.id=jd.id AS to_keep, j.id INTO junk.test_table
FROM (
 SELECT time_job, recurrenceid, client, creatortype, deleted, MIN(id) AS id, row_number() over () AS dup_group_no
 FROM ja_jobs
 WHERE clientid = 33731 AND creatortype = 'legacyrec' AND deleted = false
 GROUP BY time_job, recurrenceid, client, creatortype, deleted
 HAVING count(*) > 1
 ) jd
JOIN ja_jobs j USING (time_job, recurrenceid, clientid, creatortype, deleted);

If there is no guaranteed correlation between older ids and older jobs by creation time, the query is trickier:

SELECT jm.dup_group_no, j.id=jd.id AS to_keep, j.id INTO junk.test_table
FROM (
 SELECT DISTINCT ON (jd.time_job, jd.recurrenceid, jd.clientid, jd.creatortype, jd.deleted) jd.time_job, jd.recurrenceid, jd.clientid, jd.creatortype, jd.deleted, jd.id, row_number() over () AS dup_group_no
 FROM (
 SELECT time_job, recurrenceid, client, creatortype, deleted
 FROM ja_jobs
 WHERE clientid = 33731 AND creatortype = 'legacyrec' AND deleted = false
 GROUP BY time_job, recurrenceid, client, creatortype, deleted
 HAVING count(*) > 1
 ) jd
 JOIN ja_jobs jm USING (time_job, recurrenceid, clientid, creatortype, deleted)
 ORDER BY jd.time_job, jd.recurrenceid, jd.clientid, jd.creatortype, jd.deleted, jd.created_date, jd.id
 ) jm
JOIN ja_jobs j USING (time_job, recurrenceid, clientid, creatortype, deleted);
answered Apr 20, 2016 at 3:10
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.