Using TOP in INSERT INTO statement

Question 1

I am reading a tutorial about TOP keyword in T-SQL. But I don't really understand the examples given in the tutorial.

Example 1:

INSERT TOP n INTO table_name1 (Col1, Col2)
SELECT Col3, Col4 FROM table_name2;

The tutorial mentions that the TOP here doesn't really achieve anything since the database doesn't have a concept of order, which I kind of get.

And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?

Example 2:

The tutorial gives another example that improves the first statement.

INSERT INTO table_name1 (Col1, Col2)
 SELECT TOP n Col3, Col4
 FROM table_name2
Order By Col1;

Here is the confusing part. Putting the TOP keyword inside the SELECT statement is supposed to give us a better control over what rows get chosen and inserted through the ORDER BY clause. So I would assume the ORDER BY clause is part of the SELECT statement in the second example. But Col1 isn't part of table_name2.

This confused the hell out of me. Thanks for any insight.

Question 2

And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?

When an order isn't specified, it will usually use the sort fields and directions from the design of the primary key of table_name2, but is not guaranteed to. As Aaron Bertrand said

It doesn't really matter what the clustered index is. SQL Server may still return the order based on some other index based on a wide variety of factors. Creating a primary key on some column DOES NOT guarantee that selects with no order by will suddenly come back ordered by that column always. Might you observe that most of the time? Sure. But that is not the same as a guarantee. I've never seen a polar bear on my street but there is no polar bear forcefield that prevents it from happening. – Aaron Bertrand♦ Sep 14 '11 at 20:41

It certainly wouldn't hurt to add your order clause in, even if you want them to return in the index order of the PK.

From the tutorial example

INSERT INTO table_name1 (Col1, Col2)
 SELECT TOP n Col3, Col4
 FROM table_name2 
Order By Col1;

Gaaaah. That's horrible coding. That will order by Col1 ASC of table_name2, even though your SELECT clause isn't explicitly returning a Col1 from the second table. It is NOT ordered by Col3 of the 2nd table aka Col1 of the 1st table. The MS docs for INSERT INTO do not list an ORDER BY clause - the ORDER BY is part of the inner SELECT clause.

It's always better to explicitly specify which table you are inserting into and selecting from. E.g. this would be a slightly better example.

INSERT INTO
 table_name1 (Col1, Col2)
 SELECT TOP 3
 t2.Col3
 ,t2.Col4
 FROM 
 table_name2 AS t2
 ORDER BY
 t2.Col1 ASC, t2.Col3 ASC, t2.Col4 ASC

After this alteration there's no confusion about what Col1 means (nor the sort order of it). I'd still recommend throwing in a sort order of Col3 and Col4 if Col1 isn't unique for all entries in table_name2, as demonstrated above. You cannot alias the table that records are inserted into (e.g. table_name1 AS t1 (t1.Col1, t1.Col2) as that is not legal syntax for existing versions of SQL Server.

Opinion-based recommendation: avoid the tutorial website that you found.

Question 3

There is never ambiguity if only one table in involved. Even it it is the select of the insert.

Question 4

the ambiguity to which I was referring is the ambiguity of interpretation by the reader, not the ambiguity of the DB engine. Perhaps I should call it "confusion" instead :)

Question 5

If the is what the book says then it is misleading at best
At least in MSSQL 2012

The insert (col1, col2) is just index based
The first item in the select goes into col1
The data type must be the same
The select stands on it own and is not aware of column names in the insert

Example1 would only be valid for just need some data and don't care
Just need some data for test

This would only be valid is there is a col1 in table_name2
If that is the case it is a terrible example as many people would assume it means table_name1.Col1

INSERT INTO table_name1 (Col1, Col2)
SELECT TOP n Col3, Col4
FROM table_name2
Order By Col1;

A better example is

INSERT INTO table_name1 (Col1, Col2)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;

Let's say Col1 is clustered index you should sort on col3 even if inserting all

This is not a name conflict
that order by is table_name2

INSERT INTO table_name1 (Col3, Col4)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;

Here is a situation you might use top without a sort but it is a delete
You would do it to keep the transaction log from filling up

select 1 
while (@@rowcount > 0) 
begin 
 delete top(10000) from table1 
end

an insert that is fill up a transaction log

select 1 
while (@@rowcount > 0) 
begin 
 INSERT INTO table_name1 (Col3, Col4)
 SELECT TOP (n) T2.Col3, T2.Col4
 FROM table_name2 T2
 LEFT JOIN table_name1 T1 
 on T1.Col3 = T2.Col3
 WHERE T1.Col3 is null
 Order By Col3;
end

mpag mpag 9787 silver badges23 bronze badges · Answer 1 · 2016-07-01 19:42:48Z

And is it correct to assume that the n rows that are chosen from table_name2 are completely random records?

When an order isn't specified, it will usually use the sort fields and directions from the design of the primary key of table_name2, but is not guaranteed to. As Aaron Bertrand said

It doesn't really matter what the clustered index is. SQL Server may still return the order based on some other index based on a wide variety of factors. Creating a primary key on some column DOES NOT guarantee that selects with no order by will suddenly come back ordered by that column always. Might you observe that most of the time? Sure. But that is not the same as a guarantee. I've never seen a polar bear on my street but there is no polar bear forcefield that prevents it from happening. – Aaron Bertrand♦ Sep 14 '11 at 20:41

It certainly wouldn't hurt to add your order clause in, even if you want them to return in the index order of the PK.

From the tutorial example

INSERT INTO table_name1 (Col1, Col2)
 SELECT TOP n Col3, Col4
 FROM table_name2 
Order By Col1;

Gaaaah. That's horrible coding. That will order by Col1 ASC of table_name2, even though your SELECT clause isn't explicitly returning a Col1 from the second table. It is NOT ordered by Col3 of the 2nd table aka Col1 of the 1st table. The MS docs for INSERT INTO do not list an ORDER BY clause - the ORDER BY is part of the inner SELECT clause.

It's always better to explicitly specify which table you are inserting into and selecting from. E.g. this would be a slightly better example.

INSERT INTO
 table_name1 (Col1, Col2)
 SELECT TOP 3
 t2.Col3
 ,t2.Col4
 FROM 
 table_name2 AS t2
 ORDER BY
 t2.Col1 ASC, t2.Col3 ASC, t2.Col4 ASC

After this alteration there's no confusion about what Col1 means (nor the sort order of it). I'd still recommend throwing in a sort order of Col3 and Col4 if Col1 isn't unique for all entries in table_name2, as demonstrated above. You cannot alias the table that records are inserted into (e.g. table_name1 AS t1 (t1.Col1, t1.Col2) as that is not legal syntax for existing versions of SQL Server.

Opinion-based recommendation: avoid the tutorial website that you found.

There is never ambiguity if only one table in involved. Even it it is the select of the insert.
the ambiguity to which I was referring is the ambiguity of interpretation by the reader, not the ambiguity of the DB engine. Perhaps I should call it "confusion" instead :)

paparazzo paparazzo 5,0481 gold badge19 silver badges32 bronze badges · Answer 2 · 2016-07-01 20:11:39Z

If the is what the book says then it is misleading at best
At least in MSSQL 2012

The insert (col1, col2) is just index based
The first item in the select goes into col1
The data type must be the same
The select stands on it own and is not aware of column names in the insert

Example1 would only be valid for just need some data and don't care
Just need some data for test

This would only be valid is there is a col1 in table_name2
If that is the case it is a terrible example as many people would assume it means table_name1.Col1

INSERT INTO table_name1 (Col1, Col2)
SELECT TOP n Col3, Col4
FROM table_name2
Order By Col1;

A better example is

INSERT INTO table_name1 (Col1, Col2)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;

Let's say Col1 is clustered index you should sort on col3 even if inserting all

This is not a name conflict
that order by is table_name2

INSERT INTO table_name1 (Col3, Col4)
SELECT TOP (n) Col3, Col4
FROM table_name2
Order By Col3;

Here is a situation you might use top without a sort but it is a delete
You would do it to keep the transaction log from filling up

select 1 
while (@@rowcount > 0) 
begin 
 delete top(10000) from table1 
end

an insert that is fill up a transaction log

select 1 
while (@@rowcount > 0) 
begin 
 INSERT INTO table_name1 (Col3, Col4)
 SELECT TOP (n) T2.Col3, T2.Col4
 FROM table_name2 T2
 LEFT JOIN table_name1 T1 
 on T1.Col3 = T2.Col3
 WHERE T1.Col3 is null
 Order By Col3;
end

Stack Exchange Network

Using TOP in INSERT INTO statement

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Using TOP in INSERT INTO statement

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions