Wednesday, July 14, 2010

T-SQL Intersect and Except

During the SQL Server 2008 Database Development Boot Camp in February, several T-SQL features introduced in SQL Server 2005 and 2008 were covered. I wanted to blog about the one that most caught my eye. SQL Server now has both Intersect and Except which follow the below syntax.

{ <query_specification> | ( <query_expression> ) }
{ EXCEPT | INTERSECT }
{ <query_specification> | ( <query_expression> ) }

Examples from the AdventureWorks database for SQL Server 2008 follow:

SELECT COUNT(*) FROM Sales.Customer
(Returns count of 19185)

SELECT CustomerID FROM Sales.Customer
INTERSECT
SELECT CustomerID FROM Sales.SalesOrderHeader
(Returns 19119 distinct CustomerIDs. They are the distinct list of CustomerIDs that are found in BOTH Sales.Customer and Sales.SalesOrderHeader)

SELECT CustomerID FROM Sales.Customer
EXCEPT
SELECT CustomerID FROM Sales.SalesOrderHeader
(Returns 66 distinct CustomerIDs. They are the distinct list of CustomerIDs that are found in Sales.Customer that are not found in Sales.SalesOrderHeader)

I've heard debate as to whether or not INTERSECT and/or EXCEPT are faster than other options. Thus, I'll leave it up to you to try them out in your own systems to determine their potential value. I will mention, though, I have heard of a real-word case where an INTERSECT query similar to the one in the above example took seconds or minutes instead of hours. In that specific scenario the table on the left side of the INTERSECT had just a few thousand records and the table on the right side of the INTERSECT had millions if not billions of records.

No comments: