Showing posts with label IBM. Show all posts
Showing posts with label IBM. Show all posts

Tuesday, October 22, 2013

Recursive SQL

Our shop recently upgraded to Oracle 11g Release 2.  Having used DB2 for many years, two new 11gR2 features caught my eye.   First, Oracle 11gR2 supports aliasing the names of common table expressions.  Second, Oracle supports SQL-99 syntax for recursion using common table expressions.

Let's start with some data first.  In the Oracle editor of your choice, run the following script. The script creates some test data for us in the system catalog, and we'll use this data to solve a practical problem.
create table t1 ( v1 varchar2(1));

create view v1 as select * from t1;

create view v2 as select * from v1;

create view v3 as select * from v2;

create view v4 as select * from v1; 
The script creates five new objects in the system catalog. The DBA_DEPENDENDCIES view shows the dependency of one object on another. In our example, view V1 depends on table T1. View V2 depends on view V1. View V3 depends on view V2.  View V4 depends on view V1. Every object, whether it is a table, view, package, or object of any type, has a row in DBA_DEPENDENCIES if the object depends on another object, or if the object has other objects depending on it.

The USER_DEPENDENCIES view is a subset of DBA_DEPENDENCIES; DBA_DEPENDENCIES may be not be available to everyone, so these examples will use USER_DEPENDENCIES. The two columns of interest to us are the NAME and REFERENCED_NAME columns. A row in this view means that object NAME depends on object REFERENCED_NAME. In the following query, we see every dependency that we described in the previous paragraph.

select name, referenced_name
from user_dependencies
where referenced_name in ('V1','T1','V2','V3') 
order by 1;

NAME    REFERENCED_NAME
-----   ---------------
V1      T1
V2      V1
V3      V2
V4      V1
The USER_DEPENDENCIES view shows us all the direct dependents of an object. The USER_DEPENDENCIES view shows that view V4 depends on view V1.  USER_DEPENDENCIES does not show that view V4 indirectly depends on table T1.  If you don't believe that, drop table T1 and then run a query against view V4.  Not so good!

A shop with many tables, views, packages, procedures, functions, and triggers, will have an extensive hierarchy of dependencies. So, while this view shows the direct dependents, it does not show all the dependents. Does view V4 depend on table T1? Before we drop table T1, we need to know the answer to that question!

This problem is easily solved with recursive SQL queries. We'll take a look at two ways of handling this. First, we'll examine how to answer this question using recursion with common table expressions, à la DB2 or Oracle 11gR2. Here's the query:
with recurs (name, rname, lvl ) as (
   select name, referenced_name, 1
   from user_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.referenced_name, r.lvl + 1
   from user_dependencies a,
        recurs  r
   where r.name = a.referenced_name
)
select name, rname, lvl
from recurs; 

NAME    RNAME   LVL
----    -----   ---
V1      T1      1
V2      V1      2
V4      V1      2
V3      V2      3
The first new feature in 11gR2 is aliasing the common table expression columns. We have declared RECURS as a common table expression with two columns, NAME and RNAME. Oracle 11gR1 supported common table expressions, but 11gR1 did not support aliasing the column names.
The next new feature is the recursive query. We have two sub-queries unioned together. The first query is our starting point: REFERENCED_NAME = 'T1'. The second query is the recursion. We select rows from a join of the USER_DEPENDENCIES table and the RECURS common table expression. The rows are linked by joining the REFERENCED_NAME of USER_DEPENDENCIES to the NAME of RECURS. In other words, we join the child row from USER_DEPENDENCIES to the parent row from RECURS. The LVL column shows the depth of the hierarchy.

DB2 LUW has supported common table expressions and recursion for at least 10 years. Oracle's pre-11g2 databases supported recursive queries in a limited way using the START WITH ... CONNECT BY PRIOR syntax.   Run the following query, and we get the same results, although not in the same order. LEVEL is an Oracle pre-defined column showing the depth of the hierarchy. In the previous query, we found this by keeping track of the level (LVL) ourselves.


select name, referenced_name, level
from user_dependencies
start with referenced_name = 'T1'
connect by prior name = referenced_name;

NAME    REFERENCED_NAME  LEVEL
----    ---------------  -----
V1      T1               1
V2      V1               2
V3      V2               3
V4      V1               2
Like Oracle's outer join operator "(+)" from the last post, START WITH ... CONNECT BY PRIOR is very common in Oracle shops, so it's important to understand it. Also, it's very concise, always a benefit to those among us who have ten thumbs on the keyboard.

Recursive queries are another good tool to have in our toolbox.  Trying to sort out the object dependencies is just one use.  Another common problem is sorting out the system privileges.  Many applications define their own authorization schemes, granting access to roles or groups, then including those roles/groups in other roles/groups.  Recursive queries can answer the question "what users have authorization to which tables", and these queries can show the underlying hierarchy.

But before we go, for our example query to be really useful, we need to account for the rest of the columns in DBA_DEPENDENCIES. In addition to the object's name, DBA_DEPENDENCIES includes the object's owner and the object's type. If we don't account for any object's type and owner, we quickly end up with a recursive loop, especially if our objects include package specs and package bodies. Here's the improved query:
with recurs (name, owner, type, rname, rowner, rtype,  lvl ) as (
   select name, owner, type, referenced_name, referenced_owner, referenced_type, 1
   from dba_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.owner, a.type, a.referenced_name, a.referenced_owner, a.referenced_type, r.lvl + 1
   from dba_dependencies a,
        recurs  r
   where r.name = a.referenced_name
     and r.owner = a.referenced_owner
     and r.type = a.referenced_type 
) cycle name, owner, type set cycle_detected to 1 default 0
   
select * 
from recurs; 
Now when we join a parent row to a child row, we're joining on all the appropriate columns, and a recursive loop is less likely. Another improvement in this query is the use of the CYCLE keyword. CYCLE instructs the database to detect when our query has previously visited the row defined by name, owner, and type. When the database detects this condition -- a recursive loop -- the database sets the CYCLE_DETECTED column to 1 and does not descend further. Now we have a really useful query.

More Reading

For more information, the vendors' publications have some good examples:

Also, see my blog entry discussing recursion to implement aggregation in SQL Server.

Monday, October 14, 2013

We gather together to join these two tables...

Joining tables using SQL is a pretty straightforward operation.  Usually we join tables where the values in one column equal the values in a second column, and the join returns only those rows where the columns values in both tables match.  Let's start with some data:
EMPLOYEES TABLE
---------------

ID      NAME               DEPT_ID
--      ----------------   -------
1       Fred Flintstone    3
2       Barney Rubble      3
3       Wilma Flintstone   -
4       Betty Rubble       -
5       Mister Slate       1

DEPARTMENT TABLE
----------------

ID      NAME
--      ------------
2       Sales
1       Front Office
3       Gravel Pit
Here's a simple query with the results.  The query answers the question "display all the employees with their departments".
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
------------    ----------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Notice that not very row in our source tables appears in the select results. Wilma and Betty are not employees, and the Sales department has no employees. Depending on our application, this might be what we want, but in some cases, we want to see the rows where we didn't get matches. For example, if the question is "display all departments and their employees", then we would expect to see the Sales department without any employees.

For many programmers, the natural response is to write a program in a high-level language using two loops.  The outer loop opens a cursor on the department table, fetches data, and for each row fetched starts an inner loop.  The inner loop opens a cursor on the employee table using the department id as a predicate..  In SQL parlance, we have constructed an OUTER JOIN.

Proponents of the "solve the problem with SQL" philosophy are surely gnashing their teeth at this solution.  There are several ways of using SQL to write this query.   We'll take a look at the most common way of handling it now, and we'll step into the way-back machine to take a look at two older ways of solving this problem.  I don't advocate using the older methods, but if you work in a shop that adopted DB2 or Oracle early, you will see them and need to understand them.

The easiest and best method is to use SQL's OUTER JOIN keywords. In this example, we'll use a LEFT OUTER JOIN. A left outer join will return the rows from the table on the left side of the join, even if those rows don't satisfy the join condition. This query shows all departments and their employees:
select d.name dept_name,
       e.name employee
from department d 
left outer join employees e
  on d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
Now the report shows the Sales department, even though the Sales department has no employees.

The early SQL databases that I worked with, SQL/DS and early versions of DB2, did not support the OUTER JOIN syntax.  To construct a LEFT OUTER JOIN, we had to construct the inner join and UNION those results with missing rows from the left hand table.  In the second part of the query, notice the correlated sub-query between the deparment and employees tables. This query yields the same results as the last query:
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id 

union all 

select d.name dept_name,
       null
from department d
where not exists ( select 1
                   from employees e
                   where d.id = e.dept_id 
                );

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
That's a lot of code, and not everyone is comfortable with the correlated sub-query, but it is preferable to coding loops and constructing joins in a high-level language.

Oracle used the "(+)", the outer join operator, to construct an outer join without needing to union the missing rows.   Oracle has deprecated this syntax and recommends that programmers use the OUTER JOIN keywords construction instead.   The deprecated syntax seems a bit clumsy until you get accustomed to it, and if you're working in a shop that has a long history with Oracle, you will see a lot of it. 

Here's the query using Oracle's "(+)" outer join operator. Notice that the "(+)" is applied to the columns of the employee table if we want all the rows from the department table. The syntax and semantics don't correlate well; it is certainly one reason why Oracle deprecated this construction.
select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id(+)
 
DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           - 
We do get the correct results, though, and programmers must have seen this as a big advantage over union-ing two queries together as in our example above.

The left outer join is just the start, we can also construct right outer joins and full outer joins using any of these methods. The preferred way is to SQL's RIGHT OUTER JOIN or FULL OUTER JOIN keywords for any new query. But, if you find yourself maintaining old queries, it's a good idea to understand the other methods of constructing outer queries, too.