Sunday, October 6, 2013

Rolling your own locks: Everything old is new again

No matter what language we program, no matter what operating system we use, or no matter what database we use,  we frequently face the same problems.  Yesterday's CICS/COBOL programmers and today's web developers both face a common problem:  how to handle database locking.  Fortunately, there is a solution common to both environments.  We will use an old CICS strategy to solve an interactive web site problem.

The pseudo-conversational nature of CICS has bedeviled many a COBOL programmer.  Processing transactions in a batch program is fairly simple:
  1. Declare a cursor, including a for update of column_name1, column_name2 clause
  2. Open the cursor
  3. Fetch rows
  4. Update where current of cursor
  5. Close cursor
  6. Commit the updates
The for update of clause instructs the database to take locks; these locks are released at the commit step or when the program terminates. The locking mechanism and database transactions insure that we process data updates while maintaining data integrity.   A loop like this can process tens of thousands of records in just a few minutes.

But in the online world of CICS, the program terminates every time a screen is displayed.  Any locks taken are released when the program terminates.  So, how do you write an online CICS program that maintains data integrity?

Web programmers have the same problem when programming interactive web pages.  In the CICS world, we called it pseudo-conversational.  In the HTML world, we call it stateless.  It's really the same problem:  in both environments, after displaying the page or the screen, we do not have a way to hold a lock.  And in fact, we do not want to hold a lock.  Remember the last time you went online and made a purchase?  It took several minutes to complete that purchase.  Imagine holding a lock on a table row or a table page for several minutes?   Such a system would quickly grind to a halt waiting for resources to unlock.

So, we adopt a strategy called optimistic locking.   We'll assume that the data we displayed in an online screen, whether it's a web page or a CICS screen, does not go stale. Rather than hold a lock after displaying a screen and while waiting for a user to continue, we simply test to see if the data went stale when we're ready to update our data.  The logic looks something like this:
  1. Select data from database
  2. Display data on the screen; program terminates and waits for user input
  3. User updates the data on the screen
  4. Verify the original data has not changed, and apply the user's updates to the database
Let's start with some data.  The Bedrock Bank ACCOUNT table has four columns:  id (a surrogate key), name, checking, savings.

ID    NAME            CHECKING   SAVINGS
--    --------------  --------   -------
1     Flintstone           100       100
2     Rubble               100       100

Suppose Fred and Wilma go to the Bedrock Bank ATM to withdraw 70 rocks from savings.  They each go to the ATM, display their account information, and attempt to make the withdrawal.  According to each display, there should be enough rocks in the savings account to withdrawal 70 rocks.  First, Wilma withdrawals 70 rocks, and all is good.  Then Fred attempts to withdrawal 70 rocks. Should the Flintstone account go 40 rocks in the red?  Of course not!  The problem is Fred made his withdrawal based on stale information.  His display says there are 100 rocks in savings, but that's old information now.  There are only 30 rocks in savings.  The ATM needs to implement controls to insure data integrity.

In the Oracle Apex Oracle Apex world, automatic row processing takes care of this behind the scenes.  We can write web-based programs as if we didn't need to worry about locking.  But in some cases our interactions with the database will be complicated enough that we need to implement the locking.  Fortunately, it's not too difficult; we just need a way to know if the data went stale before we update it.

For our first solution, we'll take a page from CICS methodology.  We'll add a last update timestamp to the ACCOUNT table.  Notice that we're adding a complete timestamp, not just a date field. A date field is not fine-grained enough for our purpose.
ID    NAME            CHECKING   SAVINGS    LAST_UPDATE_TS
--    --------------  --------    ------    -----------------------
1     Flintstone           100       100    2013-09-15.12.02.57.000321
2     Rubble               100       100    2013-08-18.15.21.01.000123

We make two assumptions: (1)  every insert or update always updates the LAST_UPDATE_TS column.  We can do this automatically with a trigger, or we can require every application to do this.  (2) Every transaction reads the LAST_UPDATE_TS on a select, and every update/delete transaction includes the LAST_UPDATE_TS as a predicate in the where clause.

Now, imagine Wilma and Fred at the ATM machine, each about to withdrawal 70 rocks:
  1. Fred slides his card, the ATM displays 100 rocks and saves the LAST_UPDATE_TS.
  2. Wilma slides her card, the ATM displays 100 rocks and saves the LAST_UPDATE_TS.
  3. Wilma withdraws 70 rocks, the ATM executes the following:
    update account
    set savings = savings - 70,
        last_update_ts = current timestamp
    where id = 1
      and last_update_ts = 2013-09-15.12.02.57.000321
    
  4. Fred attempts to withdraw 70 rocks, the ATM executes the following:  
    update account
    set savings = savings - 70,
        last_update_ts = current timestamp
    where id = 1
      and last_update_ts = 2013-09-15.12.02.57.000321
    
  5. Fred's attempt fails.  Wilma's transaction changed both the SAVINGS column and the LAST_UPDATE_TS column. The ATM displays a message telling Fred his account balance has changed and re-displays the Flintstone account balance, now 30 rocks.
Not every shop timestamps rows of data, so in the Oracle Apex world, there is another way to detect stale data.  The wwv_flow_item.md5 function accepts up to 50 arguments and returns a MD5 checksum.  We can substitute the MD5 checksum for the LAST_UPDATE_TS in the above example.   We don't save the MD5 checksum in the table; we just compute it on the fly.  When Fred and Wilma slide their cards, the ATM executes the following SQL:
select name,
       checking, 
       saving,
       wwv_flow_item.md5(name, checking, saving)
into  :v_name,
      :v_checking
      :v_saving
      :v_hidden_md5
from account
where id = 1

And the update would look like:
update account
set saving = saving - 70
where id = 1
  and wwv_flow_item.md5(name, checking, saving) = :v_hidden_md5

Notice that we compute the MD5 checksum from the current data in both SQL statements.  When Wilma completes her transaction before Fred, the MD5 checksum changes, and Fred's transactions fails, just as it did before.

Now we have tow tools to handle online web transactions in the stateless world web programming. We can use either row timestamps or MD5 checksums to identify stale data.  Just remember to rollback the whole transaction if any part of the transaction fails.

More Reading

If you're rolling your own HTML pages outside of Apex and you need to compute checksums, the Oracle dbms_obfuscation_toolkit package includes an md5 function.  The md5 function only accepts one argument, so just concatenate the values together.  PL/SQL happily concatenates numbers to strings to numbers, so we don't need to worry about dissimilar types:
select name,
       checking, 
       saving,
       dbms_obfuscation_toolkit.md5(input_string => name || checking || saving)
into  :v_name,
      :v_checking
      :v_saving
      :v_hidden_md5
from account
where id = 1



No comments:

Post a Comment