Lost Learnings: Oracle

Showing posts with label Oracle. Show all posts

Saturday, December 14, 2013

Regular Expressions: Analyzing PL/SQL Source

We're going to examine a few themes that we've touched on in earlier posts. In Get Regular (Expressions) we took a first look at regular expressions, and in Recursive SQL we used the Oracle DBA_DEPENDENCIES view for some sample data. We'll take another look at a couple of views in the Oracle 11g database, and we'll use regular expressions to dig a little deeper into these views.

We have an old package that's been running for over 15 years. It's gotten rather long in the tooth, and we need to replace it. First, we need to find all the other objects that use this package. Oracle's DBA_DEPENDENCIES view will show every object that directly references a package.

describe dba_dependencies
Name                 Null     Type          
-------------------- -------- ------------- 
OWNER                NOT NULL VARCHAR2(30)  
NAME                 NOT NULL VARCHAR2(30)  
TYPE                          VARCHAR2(18)  
REFERENCED_OWNER              VARCHAR2(30)  
REFERENCED_NAME               VARCHAR2(64)  
REFERENCED_TYPE               VARCHAR2(18)  
REFERENCED_LINK_NAME          VARCHAR2(128) 
DEPENDENCY_TYPE               VARCHAR2(4)

We plug our schema name and package name in as the referenced_owner and referenced_name, run the query, and the database displays the information we're looking for.

select owner, name, type  
from dba_dependencies
where referenced_owner = 'MY_SCHEMA'
  and referenced_name = 'MY_PACKAGE';

And we see that three other objects use our package:

OWNER       NAME      TYPE             
-------     ------    ------------    
SCHEMA1     PCK1      PACKAGE BODY           
SCHEMA4     PROC1     PROCEDURE        
SCHEMA5     FUNC1     FUNCTION

But we need to dig a little deeper. Our package has many procedures and functions defined within it. Some of the package functions are obsolete, and some of the package functions need to be rewritten in a new package. Not only do we want to know what other objects are dependent on our package, we want to know which of the package's functions and procedures are called. The DBA_SOURCE view has all the PL/SQL code compiled into the database.

describe dba_source
Name  Null Type           
----- ---- -------------- 
OWNER      VARCHAR2(30)   
NAME       VARCHAR2(30)   
TYPE       VARCHAR2(12)   
LINE       NUMBER         
TEXT       VARCHAR2(4000)

We will use two regular expression functions to examine the source. We will use REGEXP_LIKE to identify the source text that has one of the functions. REGEXP_LIKE returns true if the pattern is found in TEXT and false otherwise. The lower-case "i" is the match parameter indicating that we want a case-insensitive search. PL/SQL source is not case sensitive, so the case-insensitive search allows us to find text that is upper, lower, or camel-case.

We will use REGEXP_SUBSTR to identify the function or procedure. We'll start searching the first position of the TEXT, and we'll return the first occurrence we find. Again, we will use a case-insensitive search. Using these two functions, we can perform the same sort of analysis that we we could do by grepping through a code tree on disk. In this example, we're looking to see if specific functions or procedures are used. Here's our query to perform this analysis against the code compiled into the database:

select distinct owner, name, type, 
       regexp_substr(text,
                    'my_package\.(my_func1|my_func2|my_proc1|my_proc2)',1,1,'i') calls
from dba_source
where regexp_like(text,
                  'my_package\.(my_func1|my_func2|my_proc1|my_proc2)','i') 
 ;

The results of the query are every package, procedure, or function that uses the package, and the query identifies what functions and procedures the calling object uses:

OWNER       NAME      TYPE             CALLS
-------     ------    ------------     -------------------
SCHEMA1     PCK1      PACKAGE BODY     my_package.my_func1
SCHEMA1     PCK1      PACKAGE BODY     MY_PACKAGE.MY_FUNC1
SCHEMA4     PROC1     PROCEDURE        my_package.my_proc2
SCHEMA5     FUNC1     FUNCTION         MY_PACKAGE.MY_FUNC2

This query is a useful summary of what's calling my_package and which procedures in my_package are used. We can get more detail from dba_source if we need it. For example, dba_source includes the line number, so the query

select distinct owner, name, type, line, 
       regexp_substr(text,
                    'my_package\.(my_func1|my_func2|my_proc1|my_proc2)',1,1,'i') calls
from dba_source
where regexp_like(text,
                  'my_package\.(my_func1|my_func2|my_proc1|my_proc2)','i') 
 ;

would return

OWNER       NAME      TYPE             LINE      CALLS
-------     ------    ------------     ----      -------------------
SCHEMA1     PCK1      PACKAGE BODY     330       my_package.my_func1
SCHEMA1     PCK1      PACKAGE BODY     621       my_package.my_func1
SCHEMA1     PCK1      PACKAGE BODY     721       MY_PACKAGE.MY_FUNC1
SCHEMA4     PROC1     PROCEDURE        45        my_package.my_proc2
SCHEMA5     FUNC1     FUNCTION         87        MY_PACKAGE.MY_FUNC2

This query shows the line number where the calling object uses our package. This is useful if we need some context to see why the calling object uses our package and what the calling package is doing with the results.

The above examples show us where specific procedures and functions are used in other objects. This is useful if we want a narrow search, but often we want to see every reference to a function or procedure or even a publicly-defined variable in our package. The regular expression for this search is even simpler: PL/SQL names consistent of letters, numbers, and a few special characters, so when the regular expression can't match any more characters in the expression [a-z0-9_$#]+, then we have the whole name. Here's the query:

select distinct owner, name, type, 
       regexp_substr(text,
                    'my_package\.[a-z0-9_$#]+',1,1,'i') calls
from dba_source
where regexp_like(text,
                  'my_package\.[a-z0-9_$#]+','i') 
 ;

DBA_DEPENDENCIES and DBA_SOURCE are good repositories to help us understand what objects are using our code and how the objects are using our code. And, the regular expression functions make searching though these tables much easier. Anytime we can get the database to do the work for us, "that's a good thing".

Wednesday, December 4, 2013

A Simple Web Client

Cloud computing and software as a service providers have lots of capacity and attractive rates, so many organizations are taking their applications to the cloud. After an application becomes cloud-based, many users need to pull some of the data to feed to other systems. In this post, we'll look at a simple PL/SQL web client that connects to a web service and downloads data.

For our data, we'll use the Data Download Program at the Federal Reserve Bank. There's a lot of statistical data on the Federal Reserve site, and we will use that data as the source for our simple web client. The data files are small and free to download, so the Bank gives us a good source of data to use.

From the Data Download Program we click on the Foreign Exchange Rates (G.5 / H.10) link, then we get a page that lets us choose which package we would like. The H10 radio button is selected by default, so clicking the Download button takes us to the download page. We could download a CSV into our browser and save the result as an OpenOffice or Excel spreadsheet. But for production purposes, following these links and downloading data through a web browser is a tedious chore. Notice the Direct Download for Automated Systems link? We'll copy that link and use it for our demonstration.

create or replace procedure simple_web_client as
    
  timeout number := 60;  -- timeout in seconds

  /* Below is the URL from the Direct Download for Automated Systems.  Just copy the link
     and paste the text.
  */
  v_url varchar2(1024) :=
  'http://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package';
   
  v_line_count number := 0;
  v_line_in varchar2(1024);
  v_header varchar2(1024);

  /* UTL_HTTP.REQ is set by the BEGIN_REQUEST function.  REQ includes 
     the URL, method, and http_version.
  */
  req   UTL_HTTP.REQ;

  /* UTL_HTTP.RESP is set by the BEGIN_RESPONSE function.  RESP includes   
     the status, a brief message, and the http version.
  */
  resp  UTL_HTTP.RESP;
    
begin
    
  utl_http.set_transfer_timeout(timeout);         -- set the timeout before opening the request
  req := utl_http.begin_request(v_url);           -- Pass our URL, get a req object back
  
  utl_http.set_header(req, 'User-Agent', 'Mozilla/4.0');

  resp := utl_http.get_response(req);             -- Pass our req object, get a resp object back
  utl_http.read_line(resp, v_header, true);       -- Read line of resp into v_header

  dbms_output.put_line('CSV Header = ' || v_header);
  v_line_count := v_line_count + 1;

  loop

    utl_http.read_line(resp, v_line_in, true);    -- csv data
    v_line_count := v_line_count + 1;
    dbms_output.put_line('CSV Data = ' || v_line_in);

  end loop;

exception
  when utl_http.end_of_body then                  -- end of the data
      dbms_output.put_line('Lines read, including header = ' || v_line_count );
      util_http.end_response(resp);               -- close the response object

  when others then
    dbms_output.put_line('Unhandled exception, lines = ' || v_line_count  );
    dbms_output.put_line(sqlerrm);
    dbms_output.put_line(dbms_utility.format_error_backtrace);
 
    -- Do not leave anything open.  Close everything and ignore errors.
    begin
       utl_http.end_response(resp);
       utl_http.end_request(req);
    exception
       when others then null;
    end;
    
end simple_web_client;

Now we execute our client using SQLPLUS or SQL/Developer. If we connect properly, our output will look something like this:

anonymous block completed
CSV Header = "Series Description","Nominal Broad Dollar Index ","Nominal Major Currencies Dollar Index ","Nominal Other Important Trading Partners Dollar Index "
CSV Data = "Unit:","Index:_1973_Mar_100","Index:_1973_Mar_100","Index:_1997_Jan_100"
CSV Data = "Multiplier:","1","1","1"
CSV Data = "Currency:","NA","NA","NA"
CSV Data = "Unique Identifier: ","H10/H10/JRXWTFB_N.B","H10/H10/JRXWTFN_N.B","H10/H10/JRXWTFO_N.B"
CSV Data = "Time Period","JRXWTFB_N.B","JRXWTFN_N.B","JRXWTFO_N.B"
CSV Data = 2013-10-14,ND,ND,ND
CSV Data = 2013-10-15,101.1463,75.7635,128.2686
CSV Data = 2013-10-16,101.0694,75.7839,128.0691
CSV Data = 2013-10-17,100.4167,74.9717,127.6647
CSV Data = 2013-10-18,100.3842,74.8882,127.7015
CSV Data = 2013-10-21,100.6396,75.0142,128.1114
CSV Data = 2013-10-22,100.3595,74.7093,127.8816
CSV Data = 2013-10-23,100.4550,74.8150,127.9575
CSV Data = 2013-10-24,100.5485,74.8527,128.1189
CSV Data = 2013-10-25,100.5134,74.9733,127.8807
Lines read, including header = 16

The Oracle utl_http packages makes reading data from a webserver as simple as reading data from a file. With a simple web client, we can read any data that's available from a web server. Our web service example used CSV-formatted data, but we could download XML data as easily. And for a real hands-free operation, schedule your client using DBMS_SCHEDULE or DBMS_JOB.

There's just one caveat when using the utl_http functions: depending on your DBAs and your installation, you may not have access to use the network functions. If you get an error like

ORA-24247: network access denied by access control list (ACL)

then it's time to visit the DBAs and request that they add your schema or your user account to the network ACL.

Tuesday, November 5, 2013

Retirement

Cloud computing is the rage these days. As we migrate old systems to the cloud, we need to retire the old system that we left behind. This is usually a several step process: we start with the easiest step to implement and undo, and we finish with the most difficult step to undo.

Assuming that the retired application has its own schema, the easiest first step is to simply revoke all the privileges on the underlying database schema. We leave the data and the programs in place, and we simply deny all users other than the owner access to the database objects. If we're retiring a large system, there will be hundreds of grants to revoke. We would like an easy way to find all the grants and revoke them. Also, we would like an easy way to undo these changes!

I work in an Oracle shop, so we'll use the Oracle system catalog as an example. Using two two views in the Oracle catalog, we can find all the information we need to revoke every grant on the retired schema's objects. The USER_TAB_PRIVS view shows all the objects where the schema is object owner, grantor, or grantee. We want to revoke every grant where the schema is the grantor and owner of the object. From USER_TAB_PRIVS we get the object name and the privilege. The USER_OBJECTS view shows us all the objects that belong to the schema. From USER_OBJECTS we get the object type, and we use the object type to skip PACKAGE BODYs and TYPE BODYs, because the grants belong to the PACKAGE or TYPE, not the PACKAGE BODY nor TYPE BODY.

Here's our script. We open a cursor from the join of the two views, we concatenate a string to revoke the privilege, and then we execute the string. The SPOOL command writes the results in an output file. SQL/Developer does not support the spool command, so login to the retired schema and run this script using sqlplus. If you're a bit nervous about running the script under the wrong user or schema, change the USER function to the quoted name of the schema instead.

set serveroutput on
spool 'Revoked_Grants.txt' 
begin

    for c in (

        select 'REVOKE ' || a.privilege ||
                  ' ON ' || a.table_name ||
                  ' FROM ' || a.grantee as revoke_command
        from user_tab_privs a
        inner join user_objects b
          on a.table_name = b.object_name
        where b.object_type not in ('PACKAGE BODY', 'TYPE BODY')
          and a.owner  =  user
          and a.grantor = user) loop
    
        dbms_output.put_line(c.revoke_command);

        begin
           execute immediate c.revoke_command;
        exception
           when others then dbms_output.put_line(' *** ' || sqlerrm);
        end;
   
    end loop;
   
end;
/
spool off

When we run this script, the spool command writes a file listing every grant that we revoked. In addition to logging the results for audit purposes, we can use this file to reverse the changes. Simply edit the file, change the REVOKEs to GRANTs, change the FROMs to TOs, save the file, and process it through sqlplus. Now we have an easy way to revoke all the privileges and an easy way to restore them if necessary.

Tuesday, October 29, 2013

How to make license plates

Database sequences So, you're going to the big house. You're going up the river. You're doing time. You're going to be making license plates, and you want to impress the warden?¹

Back in the late 1970's, the license plate on my car read "CAR 999". I always liked that my car's plate read C-A-R. Creating the numbers from 001 though 999 is pretty easy, but creating the letters from AAA through ZZZ is a little trickier. In this post, we'll discuss how to create unique identification numbers, and we'll virtually re-create the late1970's State of Maryland license plates with three letters followed by three numbers.

The license plate problem is one example of generating unique identification numbers. We go to school, we get an id. We go to work, we get an id. We have an id for the computer, we have an id on our driver's license, we have a Social Security number, and our insurance policies have id numbers. Sometimes the ids are just numbers, but more often the id is a mix of letters and numbers. Using letters instead of or in addition to numbers greatly expands the number of ids. On a car's license plate, the letters and numbers take the same amount of space. But, the letters from AAA to ZZZ give us 17,576 possible combinations, while the numbers from 000 to 999 give us only 1000 combinations. Using only numbers, we quickly run out of identification numbers.

When we create ids, we can not reuse a given sequence of letters and numbers. We need to know what we used previously. There are a few ways of tackling this problem. In our license plate example, we could pre-generate all the possible plate numbers, store them in a table, and mark them "in-use" as we use them. Or, we could store the last plate in a table, generate the next plate based on the last plate, and update the table with the latest plate. But, the easiest solution is to use database sequences. A database sequence is an object defined in the database that automatically increments each time we consume a number. We don't have to pre-generate plate numbers, we don't have to update tables or keep track of anything. The database does the work for us. And that's a good thing.

We'll start by creating two sequences, one for the letters and one for the numbers. More about why the letters start with 676 and finish with 17575 later:

create sequence numbers_seq 
   start with 1 maxvalue 999 cycle nocache;

create sequence letters_seq 
   start with 676 maxvalue 17575 nocache cycle;

We can use the number_seq directly, but we must convert the letter_seq to three characters. If we think about the letters of the alphabet as a base 26 counting system, then we need to convert base 10 digits to base 26 digits. We learned to do this in Computer Science 101, when we converted base 10 numbers to base 16 numbers and back again. The solution is the same, only the base is different. This function accepts a base 10 number and returns a three-character base 26 representation:

create or replace function seq2letters(p_seq in number)
    return varchar2 is
    
  v_letters varchar2(3) default null;
  v_n integer := p_seq;
  v_r integer;
  digits varchar2(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  base integer  := length(digits);
begin
   while ( v_n > 0) loop
      v_r := mod(v_n,base);
      v_letters := substr(digits,v_r + 1, 1) || v_letters;
      v_n := floor( v_n / base );
      end loop;
   return v_letters;
end seq2letters;

For debugging purposes, we'll create the a function to do the opposite conversion, too. When we decide on the start and maxvalue values for the letter_seq, this is useful for finding out what the base 10 numbers are for BAA and ZZZ.

create or replace function letters2num(  p_letters in varchar2)
    return number
is
  v_length integer := length(p_letters);
  v_num number := 0;
  v_i integer;
  digits varchar2(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  base integer  := length(digits);
  v_digit varchar2(1);  
  v_digit_value integer;
  v_power integer := 0;
begin
  for v_i in reverse 1 .. v_length loop
    v_digit := substr(p_letters,v_i,1);
    v_digit_value := instr(digits, v_digit);
    v_num := v_num + (v_digit_value - 1) * base ** v_power;
    v_power := v_power + 1;
  end loop;
  return v_num;
end letters2num;

Let's test our two new functions. We want to find the base 10 equivalents of BAA and ZZZ using letters2num. Then, we'll test seq2letters to make sure we get our starting letters back.

select letters2num('BAA') start, letters2num('ZZZ') finish
from dual;

START      FINISH
-----      ------
  676       15757

select seq2letters(676) first_plate, seq2letters(15757) last_plate
from dual;

FIRST_PLATE     LAST_PLATE
-----------     ----------
BAA             ZZZ

Perfect! We have the tools to translate numbers between base 10 and base 26.

Next, we'll create two functions that return the next letters and next numbers. Our function will get the next number seq and return a 3-digit number. The letters function is a little more complicated. Whenever the plate number is 1, we need to get the next letters. If the plate number is not 1, then we reuse the current letters. In either case, the letter function will use the base 10 converter function to convert the sequence to three letters. Here's the code

create or replace function get_numbers return number is
    
begin
    
    return numbers_seq.nextval;

end get_numbers;

create or replace function get_letters ( p_num number ) return varchar is
    
    v_letters_10 number;
    v_letters_26 varchar2(3);
begin
    
    if p_num = 1 then  
       
        v_letters_10 := letters_seq.nextval;
       
    else

       select last_number - 1
       into v_letters_10
       from user_sequences
       where sequence_name = 'LETTERS_SEQ';


    end if;
        
    v_letters_26 := seq2letters(v_letters_10);
        
    return v_letters_26;
       
end get_letters;

Now we have all the code we need to generate a license plate. First we call the numbers function, then we call the letters function. In this demo, hosted on Oracle's APEX demonstration website, we'll create a 1970's Maryland license plate. And what does a vintage late 1970s Maryland plate look like? Check eBay: it's more than an auction site, it's a museum as well! The old plates were pretty simple: large red letters and numbers with a red border around the whole plate. Try out the demonstration, every click of the Make a Plate button makes another license plate. Try running the demonstration in two or more windows, you'll never make the same plate twice. Well, at least not until you get to plate ZZZ 999 and the sequence roll over.

1. American euphemisms for going to prison.

Tuesday, October 22, 2013

Recursive SQL

Our shop recently upgraded to Oracle 11g Release 2. Having used DB2 for many years, two new 11gR2 features caught my eye. First, Oracle 11gR2 supports aliasing the names of common table expressions. Second, Oracle supports SQL-99 syntax for recursion using common table expressions.

Let's start with some data first. In the Oracle editor of your choice, run the following script. The script creates some test data for us in the system catalog, and we'll use this data to solve a practical problem.

create table t1 ( v1 varchar2(1));

create view v1 as select * from t1;

create view v2 as select * from v1;

create view v3 as select * from v2;

create view v4 as select * from v1;

The script creates five new objects in the system catalog. The DBA_DEPENDENDCIES view shows the dependency of one object on another. In our example, view V1 depends on table T1. View V2 depends on view V1. View V3 depends on view V2. View V4 depends on view V1. Every object, whether it is a table, view, package, or object of any type, has a row in DBA_DEPENDENCIES if the object depends on another object, or if the object has other objects depending on it.

The USER_DEPENDENCIES view is a subset of DBA_DEPENDENCIES; DBA_DEPENDENCIES may be not be available to everyone, so these examples will use USER_DEPENDENCIES. The two columns of interest to us are the NAME and REFERENCED_NAME columns. A row in this view means that object NAME depends on object REFERENCED_NAME. In the following query, we see every dependency that we described in the previous paragraph.

select name, referenced_name
from user_dependencies
where referenced_name in ('V1','T1','V2','V3') 
order by 1;

NAME    REFERENCED_NAME
-----   ---------------
V1      T1
V2      V1
V3      V2
V4      V1

The USER_DEPENDENCIES view shows us all the direct dependents of an object. The USER_DEPENDENCIES view shows that view V4 depends on view V1. USER_DEPENDENCIES does not show that view V4 indirectly depends on table T1. If you don't believe that, drop table T1 and then run a query against view V4. Not so good!

A shop with many tables, views, packages, procedures, functions, and triggers, will have an extensive hierarchy of dependencies. So, while this view shows the direct dependents, it does not show all the dependents. Does view V4 depend on table T1? Before we drop table T1, we need to know the answer to that question!

This problem is easily solved with recursive SQL queries. We'll take a look at two ways of handling this. First, we'll examine how to answer this question using recursion with common table expressions, à la DB2 or Oracle 11gR2. Here's the query:

with recurs (name, rname, lvl ) as (
   select name, referenced_name, 1
   from user_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.referenced_name, r.lvl + 1
   from user_dependencies a,
        recurs  r
   where r.name = a.referenced_name
)
select name, rname, lvl
from recurs; 

NAME    RNAME   LVL
----    -----   ---
V1      T1      1
V2      V1      2
V4      V1      2
V3      V2      3

The first new feature in 11gR2 is aliasing the common table expression columns. We have declared RECURS as a common table expression with two columns, NAME and RNAME. Oracle 11gR1 supported common table expressions, but 11gR1 did not support aliasing the column names.
The next new feature is the recursive query. We have two sub-queries unioned together. The first query is our starting point: REFERENCED_NAME = 'T1'. The second query is the recursion. We select rows from a join of the USER_DEPENDENCIES table and the RECURS common table expression. The rows are linked by joining the REFERENCED_NAME of USER_DEPENDENCIES to the NAME of RECURS. In other words, we join the child row from USER_DEPENDENCIES to the parent row from RECURS. The LVL column shows the depth of the hierarchy.

DB2 LUW has supported common table expressions and recursion for at least 10 years. Oracle's pre-11g2 databases supported recursive queries in a limited way using the START WITH ... CONNECT BY PRIOR syntax. Run the following query, and we get the same results, although not in the same order. LEVEL is an Oracle pre-defined column showing the depth of the hierarchy. In the previous query, we found this by keeping track of the level (LVL) ourselves.

select name, referenced_name, level
from user_dependencies
start with referenced_name = 'T1'
connect by prior name = referenced_name;

NAME    REFERENCED_NAME  LEVEL
----    ---------------  -----
V1      T1               1
V2      V1               2
V3      V2               3
V4      V1               2

Like Oracle's outer join operator "(+)" from the last post, START WITH ... CONNECT BY PRIOR is very common in Oracle shops, so it's important to understand it. Also, it's very concise, always a benefit to those among us who have ten thumbs on the keyboard.

Recursive queries are another good tool to have in our toolbox. Trying to sort out the object dependencies is just one use. Another common problem is sorting out the system privileges. Many applications define their own authorization schemes, granting access to roles or groups, then including those roles/groups in other roles/groups. Recursive queries can answer the question "what users have authorization to which tables", and these queries can show the underlying hierarchy.

But before we go, for our example query to be really useful, we need to account for the rest of the columns in DBA_DEPENDENCIES. In addition to the object's name, DBA_DEPENDENCIES includes the object's owner and the object's type. If we don't account for any object's type and owner, we quickly end up with a recursive loop, especially if our objects include package specs and package bodies. Here's the improved query:

with recurs (name, owner, type, rname, rowner, rtype,  lvl ) as (
   select name, owner, type, referenced_name, referenced_owner, referenced_type, 1
   from dba_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.owner, a.type, a.referenced_name, a.referenced_owner, a.referenced_type, r.lvl + 1
   from dba_dependencies a,
        recurs  r
   where r.name = a.referenced_name
     and r.owner = a.referenced_owner
     and r.type = a.referenced_type 
) cycle name, owner, type set cycle_detected to 1 default 0
   
select * 
from recurs;

Now when we join a parent row to a child row, we're joining on all the appropriate columns, and a recursive loop is less likely. Another improvement in this query is the use of the CYCLE keyword. CYCLE instructs the database to detect when our query has previously visited the row defined by name, owner, and type. When the database detects this condition -- a recursive loop -- the database sets the CYCLE_DETECTED column to 1 and does not descend further. Now we have a really useful query.

More Reading

For more information, the vendors' publications have some good examples:

DB2 Info Center
Oracle 11gR2 Don't be put off by the term "subquery factoring". It's just recursion.

Also, see my blog entry discussing recursion to implement aggregation in SQL Server.

Monday, October 14, 2013

We gather together to join these two tables...

Joining tables using SQL is a pretty straightforward operation. Usually we join tables where the values in one column equal the values in a second column, and the join returns only those rows where the columns values in both tables match. Let's start with some data:

EMPLOYEES TABLE
---------------

ID      NAME               DEPT_ID
--      ----------------   -------
1       Fred Flintstone    3
2       Barney Rubble      3
3       Wilma Flintstone   -
4       Betty Rubble       -
5       Mister Slate       1

DEPARTMENT TABLE
----------------

ID      NAME
--      ------------
2       Sales
1       Front Office
3       Gravel Pit

Here's a simple query with the results. The query answers the question "display all the employees with their departments".

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
------------    ----------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate

Notice that not very row in our source tables appears in the select results. Wilma and Betty are not employees, and the Sales department has no employees. Depending on our application, this might be what we want, but in some cases, we want to see the rows where we didn't get matches. For example, if the question is "display all departments and their employees", then we would expect to see the Sales department without any employees.

For many programmers, the natural response is to write a program in a high-level language using two loops. The outer loop opens a cursor on the department table, fetches data, and for each row fetched starts an inner loop. The inner loop opens a cursor on the employee table using the department id as a predicate.. In SQL parlance, we have constructed an OUTER JOIN.

Proponents of the "solve the problem with SQL" philosophy are surely gnashing their teeth at this solution. There are several ways of using SQL to write this query. We'll take a look at the most common way of handling it now, and we'll step into the way-back machine to take a look at two older ways of solving this problem. I don't advocate using the older methods, but if you work in a shop that adopted DB2 or Oracle early, you will see them and need to understand them.

The easiest and best method is to use SQL's OUTER JOIN keywords. In this example, we'll use a LEFT OUTER JOIN. A left outer join will return the rows from the table on the left side of the join, even if those rows don't satisfy the join condition. This query shows all departments and their employees:

select d.name dept_name,
       e.name employee
from department d 
left outer join employees e
  on d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

Now the report shows the Sales department, even though the Sales department has no employees.

The early SQL databases that I worked with, SQL/DS and early versions of DB2, did not support the OUTER JOIN syntax. To construct a LEFT OUTER JOIN, we had to construct the inner join and UNION those results with missing rows from the left hand table. In the second part of the query, notice the correlated sub-query between the deparment and employees tables. This query yields the same results as the last query:

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id 

union all 

select d.name dept_name,
       null
from department d
where not exists ( select 1
                   from employees e
                   where d.id = e.dept_id 
                );

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

That's a lot of code, and not everyone is comfortable with the correlated sub-query, but it is preferable to coding loops and constructing joins in a high-level language.

Oracle used the "(+)", the outer join operator, to construct an outer join without needing to union the missing rows. Oracle has deprecated this syntax and recommends that programmers use the OUTER JOIN keywords construction instead. The deprecated syntax seems a bit clumsy until you get accustomed to it, and if you're working in a shop that has a long history with Oracle, you will see a lot of it.

Here's the query using Oracle's "(+)" outer join operator. Notice that the "(+)" is applied to the columns of the employee table if we want all the rows from the department table. The syntax and semantics don't correlate well; it is certainly one reason why Oracle deprecated this construction.

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id(+)
 
DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

We do get the correct results, though, and programmers must have seen this as a big advantage over union-ing two queries together as in our example above.

The left outer join is just the start, we can also construct right outer joins and full outer joins using any of these methods. The preferred way is to SQL's RIGHT OUTER JOIN or FULL OUTER JOIN keywords for any new query. But, if you find yourself maintaining old queries, it's a good idea to understand the other methods of constructing outer queries, too.

Thursday, September 12, 2013

Alice's Adventures in SQL-NULL-Land

Programming in a relational database environment is usually a very predictable affair. Everyone understands the concept of rows and columns in tables. SQL has only four operations to learn (five if you count MERGE). And thanks to products like Excel, business users know the concept, too, and we all have a common language of tables, rows, and columns.

But, for programmers, there is one thing that remains tricky, and it's a little thing if it was a thing at all, and if we're not careful, we're soon tumbling down Alice's rabbit hole. That one little thing is the NULL. And the NULL is at its trickiest when we use it in logic.

Here's a sample Oracle PL/SQL anonymous block. Before you cut and paste this snippet of code into SQLPLUS or another SQL tool of your choice, answer this question: what will it print?

set serveroutput on 
declare

   the_letter varchar2(1) := null; 

begin

   if the_letter not in ( 'A', 'B', 'C')   then 
        dbms_output.put_line('if true');
   else 
       dbms_output.put_line('else false');
   end if; 

end;

This bit of code cuts to the point right away. Execute it, and the output is

else false

There's no mistake here, "else false" is the code-correct answer. Sure, the value in the_letter may not be an A or a B or a C, and perhaps you expect it to print "if true". This result may not be what you want or expect, but this code always prints "else false". And in SQL-NUL-Land, that is the correct answer. Here's another code snippet:

set serveroutput on 
declare

   alice boolean;

begin 

    alice := 'RABBIT' = NULL;
  
   if ( alice ) then 
      dbms_output.put_line('The rabbit is null');
   end if;

   if NOT (alice) then 
      dbms_output.put_line('The rabbit is not null');
   end if;

end;

Execute that, and what do you see? It's like listening to John Cage's 4'33": Nothing. And in that nothing is everything.

We are taught that binary variables have two states: TRUE or FALSE. And that's true, or not false, except when handling NULLs. Our binary variable ALICE actually has three states: TRUE, FALSE, or NULL. As programmers, we need to properly code for the three possible states of a binary variable.

There are a several ways of handling this, here are few suggestions:

Use the VALUE, COALESCE, or NVL function to assign a default value to any variable that might be null.
Leave an extra ELSE on all IF/THEN/ELSE constructs.
Explicitly test for NULL values. Notice that we explicitly test for NULL using the keyword "is" and not the equal "=" sign.
Properly initialize all variables when we declare them.

Here's a code snippet showing all three suggestions.

declare
   alice boolean;
begin
   alice := 'RABBIT' = NULL;
 
   if NOT( coalesce(alice,false) ) then
      dbms_output.put_line('The rabbit is not null');
   end if;
       
   if (alice) then 
      dbms_output.put_line('The rabbit is not null');
   elsif NOT(alice) then
      dbms_output.put_line('The rabbit is not null');
   else -- Catch the third state
      dbms_output.put_line('Alice is undefined, neither true nor false');
   end if;

   if alice is null then -- Explicitly test for null
      dbms_output.put_line('Alice is undefined, neither true nor false');
   end if;
      

end;

Yes, the examples are a bit contrived, and of course, you wouldn't really code something like

alice := 'RABBIT' = NULL;

But null values can occur in odd ways. Perhaps a function returning a value behaves badly and returns a null? Or a SQL select doesn't satisfy the where clause? Consider this snippet, based on a real problem:

declare

   the_name varchar2(32);

   is_bedrock_citizen boolean;

begin

   -- Function returning who is using this routine
   the_user := get_citizen_name();  

   -- See if they live in Bedrock
   is_bedrock_citizen := the_user in ('FRED','WILMA','BARNEY','BETTY'); 

   if NOT is_bedrock_citizen then
       raise_application_error(-20001,
                               'You are not a Bedrock citizen.');
   else
       dbms_output.put_line('You are a Bedrock citizen.');
       -- More code below...
    
   end if;

end;

When the get_citizen_name() function works properly and returns someone's name, the program works as its designers intended. The name is compared to the list of Bedrock citizens and assigns a TRUE or FALSE value to the is_bedrock_citizen variable. Foks who live outside of Bedrock get the raise_application_error message and the program stops; Wilma, Fred, Barney, and Betty get the "You are a citizen" message and the program continues.

But what happens when get_citizen_name() misbehaves and returns a NULL value? Now we have a logical comparison using a NULL value, and the result is neither TRUE nor FALSE, the logical value is NULL. The logical test of NOT is_bedrock_citizen fails and the else block executes. So, whether this user was a citizen or not, if get_citizen_name() misbehaves, then everyone is a citizen!

Good luck, and don't go down this rabbit hole! Initialize your variables, and be prepared to handle NULLs.

Thursday, September 5, 2013

Who are you? The Sequel

In an earlier post, I described some of the information available to us when we deliver a web page from an Oracle database and web server. Before we deliver a page, we can query the remote address of the party asking for the page. For example,

select owa_util.get_cgi_env('REMOTE_ADDR') remote_addr
from dual; 

REMOTE_ADDR
----------------- 
24.91.24.118

1 row selected.

But what does that number really mean? Where is it coming from? In this post, we'll take a closer look at the IP address, and we'll get a little more information about the address.

The address by itself may not mean very much to us. Most of us in the U.S use IPV4 addressing, which looks something like 192.168.1.10 -- 4 numbers separated by 3 periods. If you're familiar with IP addresses, you'll recognize that an address like 192.168 is a private address, one that is within your firewall. But there's more information to be gleaned from these addresses than just inside the firewall or outside the firewall. If we could see the name that the address belongs to, then we'd have a better idea of who owns the address.

Behind every name is an IP address. If we start a command window or terminal window and ping Google, we see that ping returns the www.google.com's IP address:

tom@linux-dv8000:~$ ping www.google.com 
PING www.google.com (208.117.233.54) 56(84) bytes of data.

We want to do the opposite, a reverse name lookup. This is the same problem you face when you get those mysterious 877- phone calls and you wonder who is calling you. Fortunately, our computer system's name resolver can resolve names into IP address, and the name resolver can do the reverse - return a name from an IP address.

The Oracle database includes a package that interfaces with the computer's name resolver: utl_inaddr. The utl_inaddr package includes two functions. GET_HOST_NAME takes an IP address and returns a domain name. GET_HOST_ADDRESS takes a host name and returns an IP address. Here are a couple of examples. First, we call GET_HOST_ADDRESS to resolve a host name into an IP address; then we do the reverse, we call GET_HOST_NAME to find a host name given an IP address:

select utl_inaddr.get_host_address('www.msln.net') from dual;

UTL_INADDR.GET_HOST_ADDRESS('WWW.MSLN.NET')
---------------------------------------------------
169.244.19.130

1 row selected.

select utl_inaddr.get_host_name('169.244.19.130') from dual;

UTL_INADDR.GET_HOST_NAME('169.244.19.130')
---------------------------------------------------
scrane.msln.net

1 row selected.

So, 169.244.19.130 belongs to msln.net. Point our browser at http://msln.net, and we get the home page of the Maine School and Library Network. Point our browser at 169.244.19.130, and we get 404 Not Found. Knowing the host name tells us much more than just knowing the IP address.

There's just one caveat when using these functions: depending on your DBAs and your installation, you may not have access to use the network functions. If you get an error like

ORA-24247: network access denied by access control list (ACL)

then it's time to visit the DBAs ard request that they add your schema or your user account to the network ACL.

Finally, here's an Oracle PL/SQL function that accepts either a host name or an IP address and returns either an IP address or a hostname. The function starts by using a regular expression to identify the arguement as either a name or an address, and then the function calls the utl_inaddr to do the translation. Notice at the end we have an exception block, to make sure we always return something to the caller.

function resolve_host( host varchar2) return varchar2 is

  v_return    varchar2(256);

begin

  if regexp_like(host,'[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+') then

    -- It's an IP address
    select utl_inaddr.get_host_name(host)
    into v_return
    from dual;

  else

    -- It's a hostname
    select utl_inaddr.get_host_address(host)
    into v_return
    from dual;

  end if;

  return v_return;

exception
    when others then
      return host;

end resolve_host;

Monday, August 26, 2013

Translate and Replace and Replace and Replace and ...

One task many programmers face is examining a string of characters and replacing specific characters with different ones. Every COBOL programmer is familiar with the INSPECT verb, and if they've really been programming COBOL for a while, they're familiar with the now-obsolete EXAMINE verb, too. We're not going to step that far back; instead we'll look at a couple ways of handling this problem in Oracle's PL/SQL programming language.

Oracle's PL/SQL function library includes several functions that replace single characters or multiple characters with a new characters or string of characters:

translate translate
replace replace
regexp_replace regexp_replace

Regexp_replace and its siblings regexp_like and regexp_substr are worthy of their own post, so we'll focus on the translate and replace functions.

Translate is the simplest function. Translate performs single characters substitution. There are three arguments: the input string we want to examine, a string of characters that we want to replace, and a new string of characters that we want to use as replacements. Here's a simple example:

select 'abcdefgh' input,
       translate('abcdefgh','bdfh','BDFH') output
from dual;

INPUT       OUTPUT
--------    --------
abcdefgh    aBcDeFgH

We see that every occurrence of b, d, f, or h was replaced by the upper case equivalent. Every character in the second argument is replaced by its equivalent character in the third argument.

This is a really easy way to do one-to-one character translations. Sometimes, we want to replace a single character by a string of characters, or perhaps replace a string of characters by a single character. The replace function handles this nicely. Here's an example; anyone needing to translate multi-byte characters to a single-byte character is familiar with this problem:

select '10 ≥ 5' input,
       replace('10 ≥ 5','≥ ','>=') output
from dual;

INPUT       OUTPUT
--------    --------
10  ≥  5     10  >= 5

This is pretty handy. We translated the single character greater than or equal to symbol into a two-character string that we can represent in a 7-bit ASCII codeset.

The replace function can convert strings into single characters, too. Here's another example:

select '(c)2013' input, 
       replace('(c)2013','(c)','©') output 
from dual;

INPUT       OUTPUT
--------    --------
(c)2013     ©2013

This is pretty handy, too. We can take a string encoded in 7-bit ASCII and replace character strings like (c) and >= with their single-character equivalents in a multi-byte character set.

There's just one thing missing: Translate can change many different characters for us in one call, but it does not handle character string replacements. Replace can change characters strings, but only one string at a time. If we want to change both the © and the ≥, we need to do something like this:

select '© ≥' input,
       replace(replace('© ≥','©','(c)),'≥','>=') output
from dual;

INPUT    OUTPUT
-----    ------
© ≥      (C) >=

This works fine for a couple of translations, but for more than a few translations, this is pretty cumbersome. Why not let the computer do it for us? We'll define a new function, called MREPLACE, that will handle multiple replace strings. The first argument will be the string we want to examine. The following arguments are pairs of strings: first the old string, then the new string. Here's an example of MREPLACE in use:

select '© ≥' input,
       mreplace('© ≥',
                '©','(c)',
                '≥','>=') output
from dual;

INPUT OUTPUT                                                                     
----- ------
© ≥  (c) >=

Perfect! Now, with just one function call, we have an easy way to replace multi-character or single-character strings with multi-character or single-character strings in one function call. Here's the source:

create or replace
function mreplace (
    p_string in varchar2, 
    p_from1 in varchar2 default null, p_to1 in varchar2 default null,
    p_from2 in varchar2 default null, p_to2 in varchar2 default null,
    p_from3 in varchar2 default null, p_to3 in varchar2 default null,
    p_from4 in varchar2 default null, p_to4 in varchar2 default null,
    p_from5 in varchar2 default null, p_to5 in varchar2 default null,
    p_from6 in varchar2 default null, p_to6 in varchar2 default null,
    p_from7 in varchar2 default null, p_to7 in varchar2 default null,
    p_from8 in varchar2 default null, p_to8 in varchar2 default null 
                  ) return varchar2 as

begin 

  if ( p_from1 is null ) then return p_string;
  else
     return mreplace(replace(p_string,p_from1, p_to1),
                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 );
  end if;           

end mreplace;

How does it work? MREPLACE uses recursion, repeatedly calling itself. MREPLACE keeps calling itself until the p_from1 argument is null. The test to stop the recursion is important if you don't want your DBAs and sysadmins darkening your office door! Also, notice that MREPLACE accepts one string to examine and eight from/to pairs. When we invoke it recursively, we call it with one string to examine that includes our string and the first from/to pair:

          replace(p_string, p_from1, p_to1)

but only seven from/to pairs:

                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 )

So, each time MREPLACE calls itself, it shifts all the arguments to the left. The eigth from/to pair are not specified and defaults to null. When p_from1 is null, then all the arguments are used, all the substitutions are completed, and MREPLACE returns the answer.

For the purposes of illustration, MREPLACE accepts up to 8 from and to pairs, but in practice we would set it to accept more, maybe up to 32 from/to pairs, or 64 from/to pairs or even more if required. Remember, MREPLACE stops the recursion when it finds the first null p_from1 argument, so you should define it to accept more from/to substitution pairs than you expect to use.

Happy computing!

Lost Learnings