Monday, October 14, 2013

We gather together to join these two tables...

Joining tables using SQL is a pretty straightforward operation.  Usually we join tables where the values in one column equal the values in a second column, and the join returns only those rows where the columns values in both tables match.  Let's start with some data:
EMPLOYEES TABLE
---------------

ID      NAME               DEPT_ID
--      ----------------   -------
1       Fred Flintstone    3
2       Barney Rubble      3
3       Wilma Flintstone   -
4       Betty Rubble       -
5       Mister Slate       1

DEPARTMENT TABLE
----------------

ID      NAME
--      ------------
2       Sales
1       Front Office
3       Gravel Pit
Here's a simple query with the results.  The query answers the question "display all the employees with their departments".
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
------------    ----------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Notice that not very row in our source tables appears in the select results. Wilma and Betty are not employees, and the Sales department has no employees. Depending on our application, this might be what we want, but in some cases, we want to see the rows where we didn't get matches. For example, if the question is "display all departments and their employees", then we would expect to see the Sales department without any employees.

For many programmers, the natural response is to write a program in a high-level language using two loops.  The outer loop opens a cursor on the department table, fetches data, and for each row fetched starts an inner loop.  The inner loop opens a cursor on the employee table using the department id as a predicate..  In SQL parlance, we have constructed an OUTER JOIN.

Proponents of the "solve the problem with SQL" philosophy are surely gnashing their teeth at this solution.  There are several ways of using SQL to write this query.   We'll take a look at the most common way of handling it now, and we'll step into the way-back machine to take a look at two older ways of solving this problem.  I don't advocate using the older methods, but if you work in a shop that adopted DB2 or Oracle early, you will see them and need to understand them.

The easiest and best method is to use SQL's OUTER JOIN keywords. In this example, we'll use a LEFT OUTER JOIN. A left outer join will return the rows from the table on the left side of the join, even if those rows don't satisfy the join condition. This query shows all departments and their employees:
select d.name dept_name,
       e.name employee
from department d 
left outer join employees e
  on d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
Now the report shows the Sales department, even though the Sales department has no employees.

The early SQL databases that I worked with, SQL/DS and early versions of DB2, did not support the OUTER JOIN syntax.  To construct a LEFT OUTER JOIN, we had to construct the inner join and UNION those results with missing rows from the left hand table.  In the second part of the query, notice the correlated sub-query between the deparment and employees tables. This query yields the same results as the last query:
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id 

union all 

select d.name dept_name,
       null
from department d
where not exists ( select 1
                   from employees e
                   where d.id = e.dept_id 
                );

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
That's a lot of code, and not everyone is comfortable with the correlated sub-query, but it is preferable to coding loops and constructing joins in a high-level language.

Oracle used the "(+)", the outer join operator, to construct an outer join without needing to union the missing rows.   Oracle has deprecated this syntax and recommends that programmers use the OUTER JOIN keywords construction instead.   The deprecated syntax seems a bit clumsy until you get accustomed to it, and if you're working in a shop that has a long history with Oracle, you will see a lot of it. 

Here's the query using Oracle's "(+)" outer join operator. Notice that the "(+)" is applied to the columns of the employee table if we want all the rows from the department table. The syntax and semantics don't correlate well; it is certainly one reason why Oracle deprecated this construction.
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id(+)
 
DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
We do get the correct results, though, and programmers must have seen this as a big advantage over union-ing two queries together as in our example above.

The left outer join is just the start, we can also construct right outer joins and full outer joins using any of these methods. The preferred way is to SQL's RIGHT OUTER JOIN or FULL OUTER JOIN keywords for any new query. But, if you find yourself maintaining old queries, it's a good idea to understand the other methods of constructing outer queries, too.


No comments:

Post a Comment