Outer joins and placement of predicates :
Outer joins are such a great tool but are also much misunderstood and abused. Some people seem to like them so much that they throw one into almost every query, regardless of whether or not it is needed!
The key to correct use of outer joins is an understanding of the logical steps required to process an outer join in a query. Here are the relevant steps from the query processing phases:
1.A cross join (Cartesian product) is formed for the two input tables in the FROM clause. The result of the Cartesian product is every possible combination of a row from the first table and a row from the second table.
2.The ON clause predicates are applied filtering only rows satisfying the predicate logic.
3.Any Outer rows filtered out by the predicates in step 2 are added back. Rows from the preserved table are added with their actual attribute values (column values), and the attributes (columns) from the non preserved table are set to NULL.
4.The WHERE clause predicates are applied.
An outer join query can produce completely different results depending on how you write it, and where predicates are placed in that query. Let's look at one example, based on the following two tables, Customers and Orders:
Customers table:
customer_nbr customer_name
------------ --------------
1 Jim Brown
2 Jeff Gordon
3 Peter Green
4 Julie Peters
Orders table:
order_nbr order_date customer_nbr order_amt
----------- ---------- ------------ ----------
1 2008-10-01 1 15.50
2 2008-12-15 2 25.00
3 2009-01-02 1 18.00
4 2009-02-20 3 10.25
5 2009-03-05 1 30.00
Our task is to retrieve a list of all customers, and the total amount they have spent on orders, since the beginning of year 2009. Instinctively, one may write the following query:
SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101'
GROUP BY C.customer_name;
But the results do not look good:
customer_name total_2009
-------------- ------------
Jim Brown 48.00
Peter Green 10.25
Customers Jeff and Julie are missing from the result set. Where is the problem? In order to understand what went wrong, let’s play back this query one step at a time following the logical processing order. The first step is a cross join between the two input tables:
SELECT C.customer_name, O.order_amt
FROM Customers AS C
CROSS JOIN Orders AS O;
This results in every possible combination of rows from both input tables:
customer_name order_amt order_date
---------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jim Brown 25.00 2008-12-15
Jim Brown 18.00 2009-01-02
Jim Brown 10.25 2009-02-20
Jim Brown 30.00 2009-03-05
Jeff Gordon 15.50 2008-10-01
Jeff Gordon 25.00 2008-12-15
Jeff Gordon 18.00 2009-01-02
Jeff Gordon 10.25 2009-02-20
Jeff Gordon 30.00 2009-03-05
Peter Green 15.50 2008-10-01
Peter Green 25.00 2008-12-15
Peter Green 18.00 2009-01-02
Peter Green 10.25 2009-02-20
Peter Green 30.00 2009-03-05
Julie Peters 15.50 2008-10-01
Julie Peters 25.00 2008-12-15
Julie Peters 18.00 2009-01-02
Julie Peters 10.25 2009-02-20
Julie Peters 30.00 2009-03-05
The next step is applying the ON predicates of the JOIN clause:
SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
INNER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr;
The result of this query includes only customers with orders. Since customer Julie does not have any orders it is excluded from the result set:
customer_name order_amt order_date
-------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jeff Gordon 25.00 2008-12-15
Jim Brown 18.00 2009-01-02
Peter Green 10.25 2009-02-20
Jim Brown 30.00 2009-03-05
The third step of the logical processing order is adding back the outer rows. These rows were excluded in the prior step because they did not satisfy the join predicates.
SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr;
Now customer Julie is added back in the result set. Notice the added outer rows from the preserved table (Customers) have values for the selected attributes (customer_name) and the non-preserved table (Orders) rows have NULL for their attributes (order_amt and order_date):
customer_name order_amt order_date
-------------- ---------- ----------
Jim Brown 15.50 2008-10-01
Jim Brown 18.00 2009-01-02
Jim Brown 30.00 2009-03-05
Jeff Gordon 25.00 2008-12-15
Peter Green 10.25 2009-02-20
Julie Peters NULL NULL
The last step is applying the WHERE clause predicates:
SELECT C.customer_name, O.order_amt, O.order_date
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
WHERE O.order_date >= '20090101';
Now the picture is clear! The culprit is the WHERE clause predicate. Customer Jeff is filtered out from the result set because he does not have orders past January 1, 2009, and customer Julie is filtered out because she has no orders at all (since the outer row added for Julie has NULL for the order_date column). In effect, in this case, the predicate in the WHERE clause turns the outer join into an inner join.
To correct our initial query, it is sufficient to move the WHERE predicate into the join condition.
SELECT C.customer_name, SUM(COALESCE(O.order_amt, 0)) AS total_2009
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
AND O.order_date >= '20090101'
GROUP BY C.customer_name;
Now, the query returns correct results because Jeff and Julie are filtered out in the join predicates, but then added back when the outer rows are added.
customer_name total_2009
-------------- ------------
Jeff Gordon 0.00
Jim Brown 48.00
Julie Peters 0.00
Peter Green 10.25
In a more complex example, with multiple joins, the incorrect filtering may happen on a subsequent table operator (like join to another table) instead in the WHERE clause. For example, say we have an OrderDetails table containing product SKU and quantity, and the request is to retrieve a list of all customers, with order amount and quantity, for selected product SKUs. The following query may seem correct:
SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
INNER JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101;
However, here the INNER join with the OrderDetails table plays the exact same role as the predicate in the WHERE clause in our previous example, in effect turning the LEFT OUTER join to INNER join. The correct query to satisfy this request needs to use a LEFT OUTER join to join to the OrderDetails table:
SELECT C.customer_name, O.order_amt, D.qty
FROM Customers AS C
LEFT OUTER JOIN Orders AS O
ON C.customer_nbr = O.customer_nbr
LEFT JOIN OrderDetails AS D
ON D.order_nbr = O.order_nbr
AND D.sku = 101;
Captions
- Asp.Net (9)
- ASP.Net Tips (1)
- C# (7)
- Computer (1)
- Internet Explorer (5)
- Java Script (11)
- Shortcut Keys (4)
- SQL Programming - Common Mistakes (11)
- Sql Server (22)
- Sql Server Definitions (8)
- Sql Server Test (2)
- SSRS (2)
About Me

- Vasanth.S
- I'm just normal guy who like to travel, addicted to TV and Computer, Playing Computer Games and Watching Movies and many more. Have lot of idea and quote in life
My Blog List
-
-
Pass data from one page to another using post15 years ago
Blog Archive
-
▼
2009
(63)
-
▼
November
(25)
- Display Database Table, Stored Procedure and Funct...
- Sql Server Restore
- Sql server backup
- Delete VS Truncate
- Date Methods
- Math Properties and Methods
- Basics of Sql Server
- Regular Expression Syntax
- trim(), startsWith() and endsWith() in JavaScript
- Differences bw temp tables, table variables and CTE
- Table Variables
- Global Temporary Tables
- Local Temporary Tables
- Removing Duplicates from a Table in SQL
- Common SQL Programming Mistakes - 10
- Common SQL Programming Mistakes - 3
- Common SQL Programming Mistakes - 2
- Common SQL Programming Mistakes - 9
- Common SQL Programming Mistakes - 8
- Common SQL Programming Mistakes - 7
- Common SQL Programming Mistakes - 6
- Common SQL Programming Mistakes - 5
- Common SQL Programming Mistakes - 4
- Common SQL Programming Mistakes - 1
- Ten Common SQL Programming Mistakes
-
▼
November
(25)