
The "Top Ten" Problem
By Craig S. Mullins
A commonly occurring problem for application developers is the desire to
retrieve a limited number of qualifying rows from the database. For example,
maybe you need to list the top ten best selling items from inventory, or a
list of the ten employees with the highest salary. There are several ways to
accomplish this using SQL.
The first reaction is to simply use the WHERE clause to eliminate
non-qualifying rows. But this is simplistic, and often is not sufficient to
produce the results desired in an optimal manner.
For example, what if the program only requires that the top ten results be
returned? This can be a somewhat difficult request to formulate using SQL
alone. Consider, for example, an application that needs to retrieve only the
top ten highest priced book titles from the pubs database. You
could simply issue a SQL request that retrieves all of the books in order by
price, but only use the first ten retrieved. That is easy, for example:
SELECT
TITLE, PRICE
FROM
TITLES
ORDER
BY PRICE DESC
Be sure to specify the ORDER BY clause with the DESC key word. This sorts
the results into descending order, instead of the default, which is ascending.
Without the DESC key word, the "top ten" would be at the very end of
the results set.
But that does not satisfy the requirements of retrieving only the top ten.
It merely sorts the results into descending sequence. So the results would
still be all pubs in the table, but in the correct order so you can view the
"top ten" prices very easily. But once again, this is not a complete
solution. The ideal solution should return only the top ten book titles with
the highest price and not merely a sorted list of all titles. There are
several Transact-SQL solutions that can be used to produce the correct
solution.
Using SET ROWCOUNT
The first Transact-SQL solution is to use the SET ROWCOUNT command. This
command is used to limit the number of rows returned by a query. It is a
separate command that must be issued before the query is run. By specifying a
numeric to ROWCOUNT, SQL Server will limit the results set for subsequent
queries to that number of rows.
So, we could extend the previous example as follows:
SET ROWCOUNT 10
SELECT
TITLE, PRICE
FROM
TITLES
ORDER
BY PRICE DESC
SET ROWCOUNT 0
Be sure to set the ROWCOUNT both before and after the query. Setting the
ROWCOUNT to zero indicates that all rows should be returned. If you do not
reset the ROWCOUNT after issuing the query, subsequent queries will also limit
the number of rows returned according to the value of ROWCOUNT.
Using the TOP Key Word
Using ROWCOUNT, however, probably is not the best solution. It requires
multiple commands and might be confusing. Instead, you should consider using
another Transact-SQL extension, the TOP key word. The TOP key word is
specified right in the SELECT statement. Its primary benefit is that it is
including in the SQL code of the query and will not impact any other
subsequent queries.
The TOP Transact-SQL feature can be used as follows:
SELECT
TOP 10 TITLE, PRICE
FROM
TITLES
ORDER
BY PRICE DESC
Of course, the TOP value can be any number, not just 10. For example, to
return the only the top five highest-priced titles simply change the query as
follows:
SELECT
TOP 5 TITLE, PRICE
FROM
TITLES
ORDER
BY PRICE DESC
This is probably the simplest and most elegant solution for limiting the
number of rows returned by a SQL Server query.
Avoiding Transact-SQL Extensions
But, you might want to avoid using Transact-SQL extensions altogether. This
might be important if you want a solution that is portable across multiple
RDBMS products. The following SQL will also return the top ten titles by
price:
SELECT
TITLES, PRICE
FROM
TITLES A
WHERE
10 > (SELECT COUNT(*)
FROM TITLES B
WHERE A.PRICE < B.PRICE)
AND
PRICE IS NOT NULL
ORDER
BY PRICE DESC
For
the query, no Transact-SQL extensions are used, so the SQL is portable from
SQL Server to other database servers, such as DB2 and Oracle. And, of course,
you can change the constant 10 to any number you wish, thereby retrieving the
top 20, or top 5, as deemed necessary by the needs of your application. Since
the PRICE column is nullable in the pubs database, you must
remove the nulls from the results set. And the ORDER BY is required to sort
the results in the right order. If it is removed from the query, the results
will still contain the top ten, but they will be in no particular order.
There is one difference between this query and the others we have examined,
and that is the way "ties" are handled. A tie occurs when more than
one row contains the same value. This query may return more than 10 rows if
there are multiple rows with the same value for price within the top ten. The
previous two queries that used Transact-SQL extensions always will limit the
number of rows returned to ten, even if there are other rows with the same
value for price as the number ten row in the results set. The needs of your
application will dictate whether ties are to be ignored or included in the
result set. If ties should not be included in the results set, do not use the
last SQL because it will always include ties.
Want to Return the Bottom Ten?
Any of these queries can be modified to return the bottom ten instead of
the top ten. With the Transact-SQL versions, simply remove the DESC from the
ORDER BY clause. This will cause the rows to be sorting into ascending
sequence, which is the default. Then the ROWCOUNT or TOP formulations will
return the bottom ten results.
For the final query, using standard SQL alone, simply reverse the less than
sign (>) to a greater than sign (<) in the subquery, and remove the DESC
from the ORDER BY clause, as follows:
SELECT
TITLES, PRICE
FROM
TITLES A
WHERE
10 > (SELECT COUNT(*)
FROM TITLES B
WHERE A.PRICE > B.PRICE)
AND
PRICE IS NOT NULL
ORDER
BY PRICE
Bottom
Line
SQL is a very flexible language, and Transact-SQL provides many extensions
to make the job of querying SQL Server databases using nothing but SQL simple
and efficient.
The "top ten" request is a common application requirement. Any
application that needs to return an ordered subset of a given entity can take
advantage of the "top ten" query. Consider using SQL to return only
the results you need instead of writing an application program that reads
query results to limit the results set. "SQL only" solutions can be
more efficient and easier to use than bulky application programs.
From SQL Server Update
(Xephon) September 2000.
© 2000 Craig S. Mullins, All rights reserved.
Home.

|