I am reading a tutorial about TOP keyword in T-SQL. But I don't really understand the examples given in the tutorial.
Example 1:
INSERT TOP n INTO table_name1 (Col1, Col2)
SELECT Col3, Col4 FROM table_name2;
The tutorial mentions that the TOP here doesn't really achieve anything since the database doesn't have a concept of order, which I kind of get.
And is it correct to assume that the n
rows that are chosen from table_name2 are completely random records?
Example 2:
The tutorial gives another example that improves the first statement.
INSERT INTO table_name1 (Col1, Col2)
SELECT TOP n Col3, Col4
FROM table_name2
Order By Col1;
Here is the confusing part. Putting the TOP keyword inside the SELECT statement is supposed to give us a better control over what rows get chosen and inserted through the ORDER BY clause. So I would assume the ORDER BY clause is part of the SELECT statement in the second example. But Col1 isn't part of table_name2.
This confused the hell out of me. Thanks for any insight.
2 Answers 2
And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?
When an order isn't specified, it will usually use the sort fields and directions from the design of the primary key of table_name2
, but is not guaranteed to. As Aaron Bertrand said
It doesn't really matter what the clustered index is. SQL Server may still return the order based on some other index based on a wide variety of factors. Creating a primary key on some column DOES NOT guarantee that selects with no order by will suddenly come back ordered by that column always. Might you observe that most of the time? Sure. But that is not the same as a guarantee. I've never seen a polar bear on my street but there is no polar bear forcefield that prevents it from happening. – Aaron Bertrand♦ Sep 14 '11 at 20:41
It certainly wouldn't hurt to add your order clause in, even if you want them to return in the index order of the PK.
From the tutorial example
INSERT INTO table_name1 (Col1, Col2)
SELECT TOP n Col3, Col4
FROM table_name2
Order By Col1;
Gaaaah. That's horrible coding. That will order by Col1 ASC
of table_name2, even though your SELECT
clause isn't explicitly returning a Col1 from the second table. It is NOT ordered by Col3
of the 2nd table aka Col1
of the 1st table. The MS docs for INSERT INTO
do not list an ORDER BY
clause - the ORDER BY
is part of the inner SELECT clause.
It's always better to explicitly specify which table you are inserting into and selecting from. E.g. this would be a slightly better example.
INSERT INTO
table_name1 (Col1, Col2)
SELECT TOP 3
t2.Col3
,t2.Col4
FROM
table_name2 AS t2
ORDER BY
t2.Col1 ASC, t2.Col3 ASC, t2.Col4 ASC
After this alteration there's no confusion about what Col1
means (nor the sort order of it). I'd still recommend throwing in a sort order of Col3
and Col4
if Col1 isn't unique for all entries in table_name2
, as demonstrated above. You cannot alias the table that records are inserted into (e.g. table_name1 AS t1 (t1.Col1, t1.Col2)
as that is not legal syntax for existing versions of SQL Server.
Opinion-based recommendation: avoid the tutorial website that you found.
-
There is never ambiguity if only one table in involved. Even it it is the select of the insert.paparazzo– paparazzo2016年07月01日 20:25:15 +00:00Commented Jul 1, 2016 at 20:25
-
2the ambiguity to which I was referring is the ambiguity of interpretation by the reader, not the ambiguity of the DB engine. Perhaps I should call it "confusion" instead :)mpag– mpag2016年07月01日 20:31:20 +00:00Commented Jul 1, 2016 at 20:31
If the is what the book says then it is misleading at best
At least in MSSQL 2012
The insert (col1, col2) is just index based
The first item in the select goes into col1
The data type must be the same
The select stands on it own and is not aware of column names in the insert
Example1 would only be valid for just need some data and don't care
Just need some data for test
This would only be valid is there is a col1 in table_name2
If that is the case it is a terrible example as many people would assume it means table_name1.Col1
INSERT INTO table_name1 (Col1, Col2)
SELECT TOP n Col3, Col4
FROM table_name2
Order By Col1;
A better example is
INSERT INTO table_name1 (Col1, Col2)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;
Let's say Col1 is clustered index you should sort on col3 even if inserting all
This is not a name conflict
that order by is table_name2
INSERT INTO table_name1 (Col3, Col4)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;
Here is a situation you might use top without a sort but it is a delete
You would do it to keep the transaction log from filling up
select 1
while (@@rowcount > 0)
begin
delete top(10000) from table1
end
an insert that is fill up a transaction log
select 1
while (@@rowcount > 0)
begin
INSERT INTO table_name1 (Col3, Col4)
SELECT TOP (n) T2.Col3, T2.Col4
FROM table_name2 T2
LEFT JOIN table_name1 T1
on T1.Col3 = T2.Col3
WHERE T1.Col3 is null
Order By Col3;
end