Lost Learnings: October 2013

Tuesday, October 29, 2013

How to make license plates

Database sequences So, you're going to the big house. You're going up the river. You're doing time. You're going to be making license plates, and you want to impress the warden?¹

Back in the late 1970's, the license plate on my car read "CAR 999". I always liked that my car's plate read C-A-R. Creating the numbers from 001 though 999 is pretty easy, but creating the letters from AAA through ZZZ is a little trickier. In this post, we'll discuss how to create unique identification numbers, and we'll virtually re-create the late1970's State of Maryland license plates with three letters followed by three numbers.

The license plate problem is one example of generating unique identification numbers. We go to school, we get an id. We go to work, we get an id. We have an id for the computer, we have an id on our driver's license, we have a Social Security number, and our insurance policies have id numbers. Sometimes the ids are just numbers, but more often the id is a mix of letters and numbers. Using letters instead of or in addition to numbers greatly expands the number of ids. On a car's license plate, the letters and numbers take the same amount of space. But, the letters from AAA to ZZZ give us 17,576 possible combinations, while the numbers from 000 to 999 give us only 1000 combinations. Using only numbers, we quickly run out of identification numbers.

When we create ids, we can not reuse a given sequence of letters and numbers. We need to know what we used previously. There are a few ways of tackling this problem. In our license plate example, we could pre-generate all the possible plate numbers, store them in a table, and mark them "in-use" as we use them. Or, we could store the last plate in a table, generate the next plate based on the last plate, and update the table with the latest plate. But, the easiest solution is to use database sequences. A database sequence is an object defined in the database that automatically increments each time we consume a number. We don't have to pre-generate plate numbers, we don't have to update tables or keep track of anything. The database does the work for us. And that's a good thing.

We'll start by creating two sequences, one for the letters and one for the numbers. More about why the letters start with 676 and finish with 17575 later:

create sequence numbers_seq 
   start with 1 maxvalue 999 cycle nocache;

create sequence letters_seq 
   start with 676 maxvalue 17575 nocache cycle;

We can use the number_seq directly, but we must convert the letter_seq to three characters. If we think about the letters of the alphabet as a base 26 counting system, then we need to convert base 10 digits to base 26 digits. We learned to do this in Computer Science 101, when we converted base 10 numbers to base 16 numbers and back again. The solution is the same, only the base is different. This function accepts a base 10 number and returns a three-character base 26 representation:

create or replace function seq2letters(p_seq in number)
    return varchar2 is
    
  v_letters varchar2(3) default null;
  v_n integer := p_seq;
  v_r integer;
  digits varchar2(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  base integer  := length(digits);
begin
   while ( v_n > 0) loop
      v_r := mod(v_n,base);
      v_letters := substr(digits,v_r + 1, 1) || v_letters;
      v_n := floor( v_n / base );
      end loop;
   return v_letters;
end seq2letters;

For debugging purposes, we'll create the a function to do the opposite conversion, too. When we decide on the start and maxvalue values for the letter_seq, this is useful for finding out what the base 10 numbers are for BAA and ZZZ.

create or replace function letters2num(  p_letters in varchar2)
    return number
is
  v_length integer := length(p_letters);
  v_num number := 0;
  v_i integer;
  digits varchar2(26) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  base integer  := length(digits);
  v_digit varchar2(1);  
  v_digit_value integer;
  v_power integer := 0;
begin
  for v_i in reverse 1 .. v_length loop
    v_digit := substr(p_letters,v_i,1);
    v_digit_value := instr(digits, v_digit);
    v_num := v_num + (v_digit_value - 1) * base ** v_power;
    v_power := v_power + 1;
  end loop;
  return v_num;
end letters2num;

Let's test our two new functions. We want to find the base 10 equivalents of BAA and ZZZ using letters2num. Then, we'll test seq2letters to make sure we get our starting letters back.

select letters2num('BAA') start, letters2num('ZZZ') finish
from dual;

START      FINISH
-----      ------
  676       15757

select seq2letters(676) first_plate, seq2letters(15757) last_plate
from dual;

FIRST_PLATE     LAST_PLATE
-----------     ----------
BAA             ZZZ

Perfect! We have the tools to translate numbers between base 10 and base 26.

Next, we'll create two functions that return the next letters and next numbers. Our function will get the next number seq and return a 3-digit number. The letters function is a little more complicated. Whenever the plate number is 1, we need to get the next letters. If the plate number is not 1, then we reuse the current letters. In either case, the letter function will use the base 10 converter function to convert the sequence to three letters. Here's the code

create or replace function get_numbers return number is
    
begin
    
    return numbers_seq.nextval;

end get_numbers;

create or replace function get_letters ( p_num number ) return varchar is
    
    v_letters_10 number;
    v_letters_26 varchar2(3);
begin
    
    if p_num = 1 then  
       
        v_letters_10 := letters_seq.nextval;
       
    else

       select last_number - 1
       into v_letters_10
       from user_sequences
       where sequence_name = 'LETTERS_SEQ';


    end if;
        
    v_letters_26 := seq2letters(v_letters_10);
        
    return v_letters_26;
       
end get_letters;

Now we have all the code we need to generate a license plate. First we call the numbers function, then we call the letters function. In this demo, hosted on Oracle's APEX demonstration website, we'll create a 1970's Maryland license plate. And what does a vintage late 1970s Maryland plate look like? Check eBay: it's more than an auction site, it's a museum as well! The old plates were pretty simple: large red letters and numbers with a red border around the whole plate. Try out the demonstration, every click of the Make a Plate button makes another license plate. Try running the demonstration in two or more windows, you'll never make the same plate twice. Well, at least not until you get to plate ZZZ 999 and the sequence roll over.

1. American euphemisms for going to prison.

Tuesday, October 22, 2013

Recursive SQL

Our shop recently upgraded to Oracle 11g Release 2. Having used DB2 for many years, two new 11gR2 features caught my eye. First, Oracle 11gR2 supports aliasing the names of common table expressions. Second, Oracle supports SQL-99 syntax for recursion using common table expressions.

Let's start with some data first. In the Oracle editor of your choice, run the following script. The script creates some test data for us in the system catalog, and we'll use this data to solve a practical problem.

create table t1 ( v1 varchar2(1));

create view v1 as select * from t1;

create view v2 as select * from v1;

create view v3 as select * from v2;

create view v4 as select * from v1;

The script creates five new objects in the system catalog. The DBA_DEPENDENDCIES view shows the dependency of one object on another. In our example, view V1 depends on table T1. View V2 depends on view V1. View V3 depends on view V2. View V4 depends on view V1. Every object, whether it is a table, view, package, or object of any type, has a row in DBA_DEPENDENCIES if the object depends on another object, or if the object has other objects depending on it.

The USER_DEPENDENCIES view is a subset of DBA_DEPENDENCIES; DBA_DEPENDENCIES may be not be available to everyone, so these examples will use USER_DEPENDENCIES. The two columns of interest to us are the NAME and REFERENCED_NAME columns. A row in this view means that object NAME depends on object REFERENCED_NAME. In the following query, we see every dependency that we described in the previous paragraph.

select name, referenced_name
from user_dependencies
where referenced_name in ('V1','T1','V2','V3') 
order by 1;

NAME    REFERENCED_NAME
-----   ---------------
V1      T1
V2      V1
V3      V2
V4      V1

The USER_DEPENDENCIES view shows us all the direct dependents of an object. The USER_DEPENDENCIES view shows that view V4 depends on view V1. USER_DEPENDENCIES does not show that view V4 indirectly depends on table T1. If you don't believe that, drop table T1 and then run a query against view V4. Not so good!

A shop with many tables, views, packages, procedures, functions, and triggers, will have an extensive hierarchy of dependencies. So, while this view shows the direct dependents, it does not show all the dependents. Does view V4 depend on table T1? Before we drop table T1, we need to know the answer to that question!

This problem is easily solved with recursive SQL queries. We'll take a look at two ways of handling this. First, we'll examine how to answer this question using recursion with common table expressions, à la DB2 or Oracle 11gR2. Here's the query:

with recurs (name, rname, lvl ) as (
   select name, referenced_name, 1
   from user_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.referenced_name, r.lvl + 1
   from user_dependencies a,
        recurs  r
   where r.name = a.referenced_name
)
select name, rname, lvl
from recurs; 

NAME    RNAME   LVL
----    -----   ---
V1      T1      1
V2      V1      2
V4      V1      2
V3      V2      3

The first new feature in 11gR2 is aliasing the common table expression columns. We have declared RECURS as a common table expression with two columns, NAME and RNAME. Oracle 11gR1 supported common table expressions, but 11gR1 did not support aliasing the column names.
The next new feature is the recursive query. We have two sub-queries unioned together. The first query is our starting point: REFERENCED_NAME = 'T1'. The second query is the recursion. We select rows from a join of the USER_DEPENDENCIES table and the RECURS common table expression. The rows are linked by joining the REFERENCED_NAME of USER_DEPENDENCIES to the NAME of RECURS. In other words, we join the child row from USER_DEPENDENCIES to the parent row from RECURS. The LVL column shows the depth of the hierarchy.

DB2 LUW has supported common table expressions and recursion for at least 10 years. Oracle's pre-11g2 databases supported recursive queries in a limited way using the START WITH ... CONNECT BY PRIOR syntax. Run the following query, and we get the same results, although not in the same order. LEVEL is an Oracle pre-defined column showing the depth of the hierarchy. In the previous query, we found this by keeping track of the level (LVL) ourselves.

select name, referenced_name, level
from user_dependencies
start with referenced_name = 'T1'
connect by prior name = referenced_name;

NAME    REFERENCED_NAME  LEVEL
----    ---------------  -----
V1      T1               1
V2      V1               2
V3      V2               3
V4      V1               2

Like Oracle's outer join operator "(+)" from the last post, START WITH ... CONNECT BY PRIOR is very common in Oracle shops, so it's important to understand it. Also, it's very concise, always a benefit to those among us who have ten thumbs on the keyboard.

Recursive queries are another good tool to have in our toolbox. Trying to sort out the object dependencies is just one use. Another common problem is sorting out the system privileges. Many applications define their own authorization schemes, granting access to roles or groups, then including those roles/groups in other roles/groups. Recursive queries can answer the question "what users have authorization to which tables", and these queries can show the underlying hierarchy.

But before we go, for our example query to be really useful, we need to account for the rest of the columns in DBA_DEPENDENCIES. In addition to the object's name, DBA_DEPENDENCIES includes the object's owner and the object's type. If we don't account for any object's type and owner, we quickly end up with a recursive loop, especially if our objects include package specs and package bodies. Here's the improved query:

with recurs (name, owner, type, rname, rowner, rtype,  lvl ) as (
   select name, owner, type, referenced_name, referenced_owner, referenced_type, 1
   from dba_dependencies
   where referenced_name = 'T1'

   union all

   select a.name, a.owner, a.type, a.referenced_name, a.referenced_owner, a.referenced_type, r.lvl + 1
   from dba_dependencies a,
        recurs  r
   where r.name = a.referenced_name
     and r.owner = a.referenced_owner
     and r.type = a.referenced_type 
) cycle name, owner, type set cycle_detected to 1 default 0
   
select * 
from recurs;

Now when we join a parent row to a child row, we're joining on all the appropriate columns, and a recursive loop is less likely. Another improvement in this query is the use of the CYCLE keyword. CYCLE instructs the database to detect when our query has previously visited the row defined by name, owner, and type. When the database detects this condition -- a recursive loop -- the database sets the CYCLE_DETECTED column to 1 and does not descend further. Now we have a really useful query.

More Reading

For more information, the vendors' publications have some good examples:

DB2 Info Center
Oracle 11gR2 Don't be put off by the term "subquery factoring". It's just recursion.

Also, see my blog entry discussing recursion to implement aggregation in SQL Server.

Monday, October 14, 2013

We gather together to join these two tables...

Joining tables using SQL is a pretty straightforward operation. Usually we join tables where the values in one column equal the values in a second column, and the join returns only those rows where the columns values in both tables match. Let's start with some data:

EMPLOYEES TABLE
---------------

ID      NAME               DEPT_ID
--      ----------------   -------
1       Fred Flintstone    3
2       Barney Rubble      3
3       Wilma Flintstone   -
4       Betty Rubble       -
5       Mister Slate       1

DEPARTMENT TABLE
----------------

ID      NAME
--      ------------
2       Sales
1       Front Office
3       Gravel Pit

Here's a simple query with the results. The query answers the question "display all the employees with their departments".

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
------------    ----------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate

Notice that not very row in our source tables appears in the select results. Wilma and Betty are not employees, and the Sales department has no employees. Depending on our application, this might be what we want, but in some cases, we want to see the rows where we didn't get matches. For example, if the question is "display all departments and their employees", then we would expect to see the Sales department without any employees.

For many programmers, the natural response is to write a program in a high-level language using two loops. The outer loop opens a cursor on the department table, fetches data, and for each row fetched starts an inner loop. The inner loop opens a cursor on the employee table using the department id as a predicate.. In SQL parlance, we have constructed an OUTER JOIN.

Proponents of the "solve the problem with SQL" philosophy are surely gnashing their teeth at this solution. There are several ways of using SQL to write this query. We'll take a look at the most common way of handling it now, and we'll step into the way-back machine to take a look at two older ways of solving this problem. I don't advocate using the older methods, but if you work in a shop that adopted DB2 or Oracle early, you will see them and need to understand them.

The easiest and best method is to use SQL's OUTER JOIN keywords. In this example, we'll use a LEFT OUTER JOIN. A left outer join will return the rows from the table on the left side of the join, even if those rows don't satisfy the join condition. This query shows all departments and their employees:

select d.name dept_name,
       e.name employee
from department d 
left outer join employees e
  on d.id = e.dept_id;

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

Now the report shows the Sales department, even though the Sales department has no employees.

The early SQL databases that I worked with, SQL/DS and early versions of DB2, did not support the OUTER JOIN syntax. To construct a LEFT OUTER JOIN, we had to construct the inner join and UNION those results with missing rows from the left hand table. In the second part of the query, notice the correlated sub-query between the deparment and employees tables. This query yields the same results as the last query:

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id 

union all 

select d.name dept_name,
       null
from department d
where not exists ( select 1
                   from employees e
                   where d.id = e.dept_id 
                );

DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

That's a lot of code, and not everyone is comfortable with the correlated sub-query, but it is preferable to coding loops and constructing joins in a high-level language.

Oracle used the "(+)", the outer join operator, to construct an outer join without needing to union the missing rows. Oracle has deprecated this syntax and recommends that programmers use the OUTER JOIN keywords construction instead. The deprecated syntax seems a bit clumsy until you get accustomed to it, and if you're working in a shop that has a long history with Oracle, you will see a lot of it.

Here's the query using Oracle's "(+)" outer join operator. Notice that the "(+)" is applied to the columns of the employee table if we want all the rows from the department table. The syntax and semantics don't correlate well; it is certainly one reason why Oracle deprecated this construction.

select d.name dept_name,
       e.name employee
from department d,
     employees e
where d.id = e.dept_id(+)
 
DEPT_NAME       EMPLOYEE
-------------   ---------------
Gravel Pit      Fred Flintstone
Gravel Pit      Barney Rubble
Front Office    Mister Slate
Sales           -

We do get the correct results, though, and programmers must have seen this as a big advantage over union-ing two queries together as in our example above.

The left outer join is just the start, we can also construct right outer joins and full outer joins using any of these methods. The preferred way is to SQL's RIGHT OUTER JOIN or FULL OUTER JOIN keywords for any new query. But, if you find yourself maintaining old queries, it's a good idea to understand the other methods of constructing outer queries, too.

Sunday, October 6, 2013

Rolling your own locks: Everything old is new again

No matter what language we program, no matter what operating system we use, or no matter what database we use, we frequently face the same problems. Yesterday's CICS/COBOL programmers and today's web developers both face a common problem: how to handle database locking. Fortunately, there is a solution common to both environments. We will use an old CICS strategy to solve an interactive web site problem.

The pseudo-conversational nature of CICS has bedeviled many a COBOL programmer. Processing transactions in a batch program is fairly simple:

Declare a cursor, including a for update of column_name1, column_name2 clause
Open the cursor
Fetch rows
Update where current of cursor
Close cursor
Commit the updates

The for update of clause instructs the database to take locks; these locks are released at the commit step or when the program terminates. The locking mechanism and database transactions insure that we process data updates while maintaining data integrity.   A loop like this can process tens of thousands of records in just a few minutes.

But in the online world of CICS, the program terminates every time a screen is displayed. Any locks taken are released when the program terminates. So, how do you write an online CICS program that maintains data integrity?

Web programmers have the same problem when programming interactive web pages. In the CICS world, we called it pseudo-conversational. In the HTML world, we call it stateless. It's really the same problem: in both environments, after displaying the page or the screen, we do not have a way to hold a lock. And in fact, we do not want to hold a lock. Remember the last time you went online and made a purchase? It took several minutes to complete that purchase. Imagine holding a lock on a table row or a table page for several minutes?   Such a system would quickly grind to a halt waiting for resources to unlock.

So, we adopt a strategy called optimistic locking.   We'll assume that the data we displayed in an online screen, whether it's a web page or a CICS screen, does not go stale. Rather than hold a lock after displaying a screen and while waiting for a user to continue, we simply test to see if the data went stale when we're ready to update our data. The logic looks something like this:

Select data from database
Display data on the screen; program terminates and waits for user input
User updates the data on the screen
Verify the original data has not changed, and apply the user's updates to the database

Let's start with some data. The Bedrock Bank ACCOUNT table has four columns: id (a surrogate key), name, checking, savings.

ID    NAME            CHECKING   SAVINGS
--    --------------  --------   -------
1     Flintstone           100       100
2     Rubble               100       100

Suppose Fred and Wilma go to the Bedrock Bank ATM to withdraw 70 rocks from savings. They each go to the ATM, display their account information, and attempt to make the withdrawal. According to each display, there should be enough rocks in the savings account to withdrawal 70 rocks. First, Wilma withdrawals 70 rocks, and all is good. Then Fred attempts to withdrawal 70 rocks. Should the Flintstone account go 40 rocks in the red? Of course not! The problem is Fred made his withdrawal based on stale information. His display says there are 100 rocks in savings, but that's old information now. There are only 30 rocks in savings. The ATM needs to implement controls to insure data integrity.

In the Oracle Apex Oracle Apex world, automatic row processing takes care of this behind the scenes. We can write web-based programs as if we didn't need to worry about locking. But in some cases our interactions with the database will be complicated enough that we need to implement the locking. Fortunately, it's not too difficult; we just need a way to know if the data went stale before we update it.

For our first solution, we'll take a page from CICS methodology. We'll add a last update timestamp to the ACCOUNT table. Notice that we're adding a complete timestamp, not just a date field. A date field is not fine-grained enough for our purpose.

ID    NAME            CHECKING   SAVINGS    LAST_UPDATE_TS
--    --------------  --------    ------    -----------------------
1     Flintstone           100       100    2013-09-15.12.02.57.000321
2     Rubble               100       100    2013-08-18.15.21.01.000123

We make two assumptions: (1) every insert or update always updates the LAST_UPDATE_TS column. We can do this automatically with a trigger, or we can require every application to do this. (2) Every transaction reads the LAST_UPDATE_TS on a select, and every update/delete transaction includes the LAST_UPDATE_TS as a predicate in the where clause.

Now, imagine Wilma and Fred at the ATM machine, each about to withdrawal 70 rocks:

Fred slides his card, the ATM displays 100 rocks and saves the LAST_UPDATE_TS.
Wilma slides her card, the ATM displays 100 rocks and saves the LAST_UPDATE_TS.

Wilma withdraws 70 rocks, the ATM executes the following:

update account
set savings = savings - 70,
    last_update_ts = current timestamp
where id = 1
  and last_update_ts = 2013-09-15.12.02.57.000321

Fred attempts to withdraw 70 rocks, the ATM executes the following:

update account
set savings = savings - 70,
    last_update_ts = current timestamp
where id = 1
  and last_update_ts = 2013-09-15.12.02.57.000321

Fred's attempt fails. Wilma's transaction changed both the SAVINGS column and the LAST_UPDATE_TS column. The ATM displays a message telling Fred his account balance has changed and re-displays the Flintstone account balance, now 30 rocks.

Not every shop timestamps rows of data, so in the Oracle Apex world, there is another way to detect stale data. The wwv_flow_item.md5 function accepts up to 50 arguments and returns a MD5 checksum. We can substitute the MD5 checksum for the LAST_UPDATE_TS in the above example. We don't save the MD5 checksum in the table; we just compute it on the fly. When Fred and Wilma slide their cards, the ATM executes the following SQL:

select name,
       checking, 
       saving,
       wwv_flow_item.md5(name, checking, saving)
into  :v_name,
      :v_checking
      :v_saving
      :v_hidden_md5
from account
where id = 1

And the update would look like:

update account
set saving = saving - 70
where id = 1
  and wwv_flow_item.md5(name, checking, saving) = :v_hidden_md5

Notice that we compute the MD5 checksum from the current data in both SQL statements. When Wilma completes her transaction before Fred, the MD5 checksum changes, and Fred's transactions fails, just as it did before.

Now we have tow tools to handle online web transactions in the stateless world web programming. We can use either row timestamps or MD5 checksums to identify stale data. Just remember to rollback the whole transaction if any part of the transaction fails.

More Reading

If you're rolling your own HTML pages outside of Apex and you need to compute checksums, the Oracle dbms_obfuscation_toolkit package includes an md5 function. The md5 function only accepts one argument, so just concatenate the values together. PL/SQL happily concatenates numbers to strings to numbers, so we don't need to worry about dissimilar types:

select name,
       checking, 
       saving,
       dbms_obfuscation_toolkit.md5(input_string => name || checking || saving)
into  :v_name,
      :v_checking
      :v_saving
      :v_hidden_md5
from account
where id = 1

Lost Learnings