I have database with Id - autoincrement, and Text- varchar.
My database is simple:
+---+--------+----+----+
| Id| Text |Col1|Col2|
+---+--------+----+----+
| 1 | adf |1 |0 |
| 2 | qwer |5 |1 |
+---+--------+----+----+
This harder than I thought. What I want to do is to cut the table at some record and delete the tail.
Reason for this is that I had unique column that at some point in time I started receiving duplicates (because i changed key UNIQUE(Text)
into UNIQUE(Text, Id)
)
Rob Farley's answer was what I asked originally in question but it wasn't what i really needed. I tried a revised version based on Rob's answer but results are still wrong:
SELECT MIN(Id) AS SmallestId, Text FROM table
GROUP BY Text HAVING COUNT(*) <2
ORDER BY SmallestId DESC
Problem with Rob's answer is that if first record (Id=1) is duplicate that lowest Id will equal 1 and i'd have to cut whole table.
Problem with my way is that it reports last (highest) occurrence of duplicate so I would have to leave some duplicates inside after the cut.
I think that what I want in human language is:
- Find a longest sequence that doesn't have duplicates inside, starting from Id 1 and ordered by Id ASC.
It's a side project now so there is no pressure on answering, reason might be to learn more SQL. I'm the asker of original question but I had to edit as another because I accidentally created two accounts.
2 Answers 2
You want the smallest id
for each Text
that appears multiple times.
SELECT MIN(id) AS SmallestId, Text
FROM Table
GROUP BY Text
HAVING COUNT(*) > 1;
Edit: Now that you've changed your question...
What about deleting later instances of the same text?
DELETE t
FROM Table t
WHERE EXISTS (SELECT *
FROM Table t2
WHERE t2.Text = t.Text
AND t2.Id < t.Id);
This should leave you with just the smallest Id for each text. It doesn't answer what you're asking, but is hopefully closer to what you need.
If you have anything referring to this table, you will need to keep a copy of the rows you delete.
-
I thought of simple ` SELECT DISTINCT` but wasn't sure which row would be selected (and i wanted to get only the earliest). So your second solution is best for this task. Thank you.Jhon– Jhon2016年03月28日 16:20:19 +00:00Commented Mar 28, 2016 at 16:20
The sequence you are looking for will be the one that ends as soon as the first duplicate is encountered. Based on this, the first step I would take would be to get rows for which duplicates with lower IDs exist:
SELECT
*
FROM
atable AS this
WHERE
EXISTS
(
SELECT
*
FROM
atable AS earlier
WHERE
this.Text = earlier.Text
AND this.Id > earlier.Id
)
ORDER BY
this.Id ASC
;
The first row in the above query's result set will be the cutting point, i.e. the one where the sought-for sequence ends. Therefore, you just need to select the rows before the cutting point:
SELECT
*
FROM
atable
WHERE
Id < (
SELECT
MIN(this.Id)
FROM
atable AS this
WHERE
EXISTS
(
SELECT
*
FROM
atable AS earlier
WHERE
this.Text = earlier.Text
AND this.Id > earlier.Id
)
)
;
or remove those starting at the cutting point:
DELETE FROM
atable
WHERE
Id >= (
SELECT
MIN(this.Id)
FROM
atable AS this
WHERE
EXISTS
(
SELECT
*
FROM
atable AS earlier
WHERE
this.Text = earlier.Text
AND this.Id > earlier.Id
)
)
;
Please note that the omitted (deleted) rows might include entries with unique values added after you changed the constraint. In light of this, Rob Farley's suggestion may be a better option for you, as it would only remove duplicates and keep all your distinct values.
-
The Rob's answer was indeed a better solution but thank you for providing your solution.Jhon– Jhon2016年03月28日 16:16:37 +00:00Commented Mar 28, 2016 at 16:16