Difference between TOP and MAX()/MIN() in SQL

Question 1

In my experience SELECT TOP is much faster than SELECT MIN()/MAX() if you have thousands/millions of rows in your database as seen in this answer and what I've answered here and the askers response.

My question is WHY is that, in my understanding TOP looks at the data same as SELECT column FROM table and when you do a ORDER BY column it sorts the results and than gives you only the top row, while the MIN()/MAX() functions should really just look at each row check the sum of that row, then go to the next row and if that row is > or < the previous row than it saves the results in the function depends if it's a MAX() or a MIN(), and this is exactly what the ORDER BY should do.

Please don't answer that in your experience the MAX() is better than TOP as the question is really what's the difference between MAX()/MIN() and TOP? both look at the data on each row and both are probably a log(n).

In response to @J.D:

One simple example is if you use TOP without an ORDER BY clause. Then the top result you'll get back is likely non-deterministic and doesn't always have to be the same row that the MAX() aggregate function would reference.

Absolutely I'm only asking when you are using ORDER BY because the powerfulness TOP has is when sorting it in a specific way.

Regarding the log(n) I would ask it in a different way, if the ORDER BY has a Ω(n) because the minimum data it needs to look on is the n and has probably a O() >= n because it's probably using the best sorting algorithm which O(n).

With the MIN() it's also probably a Ω(n) and O(n) because it will always need to scan all the data to find the answer, and never more than n, so why does it sometimes make a huge difference?

To put it another way:

What's the difference in execution time if I have a table of a million rows that has customers and the amount they've spent to buy product's, if I'm using SELECT TOP 1 amount_column FROM customer_table ORDER BY amount_column DESC and using SELECT MAX(amount_column) FROM customer_table the first need's to look at all the data in amount_column and sort it descending order so it doesn't make a difference if it's already sorted or not because it needs to check all rows if it's > than previous row, and then when it finished executing it returns the maximum amount.

And when I'm doing it using the MAX() function it also needs to scan (in my understanding) all rows and save the first row to the results and then go to the next row and check if it's > than results if yes it saves it to results, and so on for the whole table, and when it finishes it returns the maximum amount.

So in my understanding both are really doing the same thing, or maybe I'm wrong with my understanding the way it executes the function, and that's what I'm asking if I'm wrong? and why?

Question 2

The thing is they're not always the same, hence them being different operators with different purposes. Some of those purposes overlap, yes, but they're not exactly one-to-one.

One simple example is if you use TOP without an ORDER BY clause. Then the top result you'll get back is likely non-deterministic and doesn't always have to be the same row that the MAX() aggregate function would reference.

As mustaccio pointed out in the comments, the execution plans will tell you specifically what operations differed between two queries, one using TOP and the other using MAX(). Without a specific query, there is not a single answer on how the two are similar or different, since it can vary for each specific use case.

Finally, to assume the Big O function of time for both operators is always O(log(n)) is also incorrect and varies depending on how your data is structured and which data points you're applying those operators on. In Table1 with columns (A, B, C) that's indexed on A ascending, the query SELECT TOP 1 A FROM Table1 ORDER BY A will likely be on the order of O(log(n)) (though I'm not sure if it can even possibly be O(1), since the ORDER BY clause matches the index sorting) but a SELECT MAX(B) FROM Table1 could be on the order of O(n) since there's no covering index on column B, and will likely need to scan the entire table.

In regards to your update, the aggregate functions like MIN() and MAX() don't need to always scan all of the data either. My first example was just setup in a way to demonstrate a difference in Big O search time complexity between MAX() and TOP in one specific use case. But using the same table example with the same columns and index covering column A ascending only, then the query SELECT MAX(A) FROM Table1 should yield a time complexity of O(log(n)) too.

Despite even only focusing on queries that use the ORDER BY clause, the similarities or differences between TOP vs MAX() and MIN() will still vary from one use case to the next, directly dependent on the query being ran, and it's execution plan that the query engine produced for that query. If you have a specific query and execution plan you want to discuss, feel free to add it to your question and we can analyze it appropriately.

To your latest update, I believe where you're getting tripped up is on the concept of data structures. You keep assuming every row of data needs to be analyzed in both cases, and that's only true in an unindexed table (a Heap data structure) or if there's no covering index for your query. So in your example, for both operators, the search time complexity would be O(n) if there's no index covering the amount_column. But if there was an index on it, descending, which would respect your statement "...if it's already sorted" then the data is stored in a B-Tree data structure which would allow your query to seek on the index and significantly reduce the runtime down to O(log(n)) because it would no longer need to compare every row. The subset of rows it would need to compare in that is much smaller than the entire table itself.

Yes in your specific example, both queries should see the same search time complexity in theory, but again what that actually is depends on how your data is structured (indexed in this case). Additionally, there's still other factors that could result in two separate query plans, where in actuality the runtimes do differ between the two queries and the reasoning for that can only be discussed when comparing two actual execution plans. It's not possible to discuss theoretically, because it could vary for a multitude of reasons based on specifically what's in the execution plan itself.

Question 3

It is worth pointing out that there's another and pretty important a difference: null behaviour is not the same for min()/max() vs top(). In such a case, it's comparison of apples and oranges.

Consider:

create table T1(id int, content nvarchar(64));
insert T1 values 
 (1,'smallest'), 
 (3,'largest'), 
 (2,'middle'), 
 (-1,'minus one'),
 (null, 'null');
select top(1) id from T1 order by id; -- returns null
select top(1) id from T1 order by id desc; -- returns 3
select 'max id' = max(id), 'min id' = min(id) from T1; -- returns 3 and -1

Question 4

In my experience SELECT TOP is much faster than SELECT MIN()/MAX()

Yes, it is, because they do rather different things.

Both start with the same resultset, constructed from your "from", "join" and "where" clauses.

Min and Max are Aggregate Functions.
They trawl through every row in that resultset and work out the smallest/largest value of the stated column.

Top starts reading through the resultset and simply stops after the specified number of rows. As others have said, using "Top n" without an "Order By" can make for some interesting debugging sessions!

Question 5

min and max do not always trawl through every row. If there is a single-column index and the column is not nullable then they can read the lowest or highest value directly. You can see this by looking at the query plan. (This is a case where min and max are just as fast as top.)

J.D. J.D. 41.1k12 gold badges63 silver badges145 bronze badges · Accepted Answer · 2021-11-11 16:23:37Z