Find a longest sequence that doesn't have duplicates inside

Question 1

I have database with Id - autoincrement, and Text- varchar.

My database is simple:

+---+--------+----+----+
| Id| Text |Col1|Col2|
+---+--------+----+----+
| 1 | adf |1 |0 |
| 2 | qwer |5 |1 |
+---+--------+----+----+

This harder than I thought. What I want to do is to cut the table at some record and delete the tail.

Reason for this is that I had unique column that at some point in time I started receiving duplicates (because i changed key UNIQUE(Text) into UNIQUE(Text, Id))

Rob Farley's answer was what I asked originally in question but it wasn't what i really needed. I tried a revised version based on Rob's answer but results are still wrong:

SELECT MIN(Id) AS SmallestId, Text FROM table 
GROUP BY Text HAVING COUNT(*) <2 
ORDER BY SmallestId DESC

Problem with Rob's answer is that if first record (Id=1) is duplicate that lowest Id will equal 1 and i'd have to cut whole table.

Problem with my way is that it reports last (highest) occurrence of duplicate so I would have to leave some duplicates inside after the cut.

I think that what I want in human language is:

Find a longest sequence that doesn't have duplicates inside, starting from Id 1 and ordered by Id ASC.

It's a side project now so there is no pressure on answering, reason might be to learn more SQL. I'm the asker of original question but I had to edit as another because I accidentally created two accounts.

Question 2

You want the smallest id for each Text that appears multiple times.

SELECT MIN(id) AS SmallestId, Text 
FROM Table 
GROUP BY Text
HAVING COUNT(*) > 1;

Edit: Now that you've changed your question...

What about deleting later instances of the same text?

DELETE t
FROM Table t
WHERE EXISTS (SELECT *
 FROM Table t2
 WHERE t2.Text = t.Text
 AND t2.Id < t.Id);

This should leave you with just the smallest Id for each text. It doesn't answer what you're asking, but is hopefully closer to what you need.

If you have anything referring to this table, you will need to keep a copy of the rows you delete.

Question 3

I thought of simple ` SELECT DISTINCT` but wasn't sure which row would be selected (and i wanted to get only the earliest). So your second solution is best for this task. Thank you.

Question 4

The sequence you are looking for will be the one that ends as soon as the first duplicate is encountered. Based on this, the first step I would take would be to get rows for which duplicates with lower IDs exist:

SELECT
 *
FROM
 atable AS this
WHERE
 EXISTS
 (
 SELECT
 *
 FROM
 atable AS earlier
 WHERE
 this.Text = earlier.Text
 AND this.Id > earlier.Id
 )
ORDER BY
 this.Id ASC
;

The first row in the above query's result set will be the cutting point, i.e. the one where the sought-for sequence ends. Therefore, you just need to select the rows before the cutting point:

SELECT
 *
FROM
 atable
WHERE
 Id < (
 SELECT
 MIN(this.Id)
 FROM
 atable AS this
 WHERE
 EXISTS
 (
 SELECT
 *
 FROM
 atable AS earlier
 WHERE
 this.Text = earlier.Text
 AND this.Id > earlier.Id
 )
 )
;

or remove those starting at the cutting point:

DELETE FROM
 atable
WHERE
 Id >= (
 SELECT
 MIN(this.Id)
 FROM
 atable AS this
 WHERE
 EXISTS
 (
 SELECT
 *
 FROM
 atable AS earlier
 WHERE
 this.Text = earlier.Text
 AND this.Id > earlier.Id
 )
 )
;

Please note that the omitted (deleted) rows might include entries with unique values added after you changed the constraint. In light of this, Rob Farley's suggestion may be a better option for you, as it would only remove duplicates and keep all your distinct values.

Question 5

The Rob's answer was indeed a better solution but thank you for providing your solution.

Rob Farley Rob Farley 16.3k2 gold badges39 silver badges61 bronze badges · Accepted Answer · 2016-03-26 23:15:50Z

You want the smallest id for each Text that appears multiple times.

SELECT MIN(id) AS SmallestId, Text 
FROM Table 
GROUP BY Text
HAVING COUNT(*) > 1;

Edit: Now that you've changed your question...

What about deleting later instances of the same text?

DELETE t
FROM Table t
WHERE EXISTS (SELECT *
 FROM Table t2
 WHERE t2.Text = t.Text
 AND t2.Id < t.Id);

This should leave you with just the smallest Id for each text. It doesn't answer what you're asking, but is hopefully closer to what you need.

If you have anything referring to this table, you will need to keep a copy of the rows you delete.

I thought of simple ` SELECT DISTINCT` but wasn't sure which row would be selected (and i wanted to get only the earliest). So your second solution is best for this task. Thank you.

Stack Exchange Network

Find a longest sequence that doesn't have duplicates inside

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find a longest sequence that doesn't have duplicates inside

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions