Lost Learnings: August 2013

Monday, August 26, 2013

Translate and Replace and Replace and Replace and ...

One task many programmers face is examining a string of characters and replacing specific characters with different ones. Every COBOL programmer is familiar with the INSPECT verb, and if they've really been programming COBOL for a while, they're familiar with the now-obsolete EXAMINE verb, too. We're not going to step that far back; instead we'll look at a couple ways of handling this problem in Oracle's PL/SQL programming language.

Oracle's PL/SQL function library includes several functions that replace single characters or multiple characters with a new characters or string of characters:

translate translate
replace replace
regexp_replace regexp_replace

Regexp_replace and its siblings regexp_like and regexp_substr are worthy of their own post, so we'll focus on the translate and replace functions.

Translate is the simplest function. Translate performs single characters substitution. There are three arguments: the input string we want to examine, a string of characters that we want to replace, and a new string of characters that we want to use as replacements. Here's a simple example:

select 'abcdefgh' input,
       translate('abcdefgh','bdfh','BDFH') output
from dual;

INPUT       OUTPUT
--------    --------
abcdefgh    aBcDeFgH

We see that every occurrence of b, d, f, or h was replaced by the upper case equivalent. Every character in the second argument is replaced by its equivalent character in the third argument.

This is a really easy way to do one-to-one character translations. Sometimes, we want to replace a single character by a string of characters, or perhaps replace a string of characters by a single character. The replace function handles this nicely. Here's an example; anyone needing to translate multi-byte characters to a single-byte character is familiar with this problem:

select '10 ≥ 5' input,
       replace('10 ≥ 5','≥ ','>=') output
from dual;

INPUT       OUTPUT
--------    --------
10  ≥  5     10  >= 5

This is pretty handy. We translated the single character greater than or equal to symbol into a two-character string that we can represent in a 7-bit ASCII codeset.

The replace function can convert strings into single characters, too. Here's another example:

select '(c)2013' input, 
       replace('(c)2013','(c)','©') output 
from dual;

INPUT       OUTPUT
--------    --------
(c)2013     ©2013

This is pretty handy, too. We can take a string encoded in 7-bit ASCII and replace character strings like (c) and >= with their single-character equivalents in a multi-byte character set.

There's just one thing missing: Translate can change many different characters for us in one call, but it does not handle character string replacements. Replace can change characters strings, but only one string at a time. If we want to change both the © and the ≥, we need to do something like this:

select '© ≥' input,
       replace(replace('© ≥','©','(c)),'≥','>=') output
from dual;

INPUT    OUTPUT
-----    ------
© ≥      (C) >=

This works fine for a couple of translations, but for more than a few translations, this is pretty cumbersome. Why not let the computer do it for us? We'll define a new function, called MREPLACE, that will handle multiple replace strings. The first argument will be the string we want to examine. The following arguments are pairs of strings: first the old string, then the new string. Here's an example of MREPLACE in use:

select '© ≥' input,
       mreplace('© ≥',
                '©','(c)',
                '≥','>=') output
from dual;

INPUT OUTPUT                                                                     
----- ------
© ≥  (c) >=

Perfect! Now, with just one function call, we have an easy way to replace multi-character or single-character strings with multi-character or single-character strings in one function call. Here's the source:

create or replace
function mreplace (
    p_string in varchar2, 
    p_from1 in varchar2 default null, p_to1 in varchar2 default null,
    p_from2 in varchar2 default null, p_to2 in varchar2 default null,
    p_from3 in varchar2 default null, p_to3 in varchar2 default null,
    p_from4 in varchar2 default null, p_to4 in varchar2 default null,
    p_from5 in varchar2 default null, p_to5 in varchar2 default null,
    p_from6 in varchar2 default null, p_to6 in varchar2 default null,
    p_from7 in varchar2 default null, p_to7 in varchar2 default null,
    p_from8 in varchar2 default null, p_to8 in varchar2 default null 
                  ) return varchar2 as

begin 

  if ( p_from1 is null ) then return p_string;
  else
     return mreplace(replace(p_string,p_from1, p_to1),
                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 );
  end if;           

end mreplace;

How does it work? MREPLACE uses recursion, repeatedly calling itself. MREPLACE keeps calling itself until the p_from1 argument is null. The test to stop the recursion is important if you don't want your DBAs and sysadmins darkening your office door! Also, notice that MREPLACE accepts one string to examine and eight from/to pairs. When we invoke it recursively, we call it with one string to examine that includes our string and the first from/to pair:

          replace(p_string, p_from1, p_to1)

but only seven from/to pairs:

                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 )

So, each time MREPLACE calls itself, it shifts all the arguments to the left. The eigth from/to pair are not specified and defaults to null. When p_from1 is null, then all the arguments are used, all the substitutions are completed, and MREPLACE returns the answer.

For the purposes of illustration, MREPLACE accepts up to 8 from and to pairs, but in practice we would set it to accept more, maybe up to 32 from/to pairs, or 64 from/to pairs or even more if required. Remember, MREPLACE stops the recursion when it finds the first null p_from1 argument, so you should define it to accept more from/to substitution pairs than you expect to use.

Happy computing!

Tuesday, August 20, 2013

How to start your car

Last weekend, I volunteered to drive classic cars across the auction block at the Owls Head Transportation Museum's 36 Annual Auto Auction. Seeing so many classic cars in one venue is a real treat. Some of these old cars, like the Jaguar E-type displayed at the Museum of Modern Art, are great works of art. And others, well, not so much...

And the Museum is a real treasure, too. Almost every old car in the collection drives, and almost every old plane   flies. During weekend events, the museum's volunteers will drive the Ford Model Ts and take visitors for rides, and the museum's pilots will fly aircraft. Watching World War I era aircraft fly around the Museum while modern aircraft take off and land at nearby Knox County airport is a lesson in aviation history.

So, how do you start your car?   This is where the fun begins. Most of us grew up starting cars with three-position ignition switches: Off, Run, Start. Lately, a few upscale manufacturers are installing a separate Start button, often in conjunction with a keyless entry system.   It's an old twist to a new problem: Austin-Healey roadsters from the early 50's through the early 60's used the same arrangement: There's a two-position ignition switch and a separate starting button labelled "S".

Newer (say, post 1950) cars use a lightweight switch that operates a heavy-duty relay.   The large current required to start a engine mandates the use of a heavy-duty switch between the battery and the starter motor. So, cars with small Start button on the dash, or cars that start using the ignition switch, all have relays.

On older cars of the 20s, 30s, and 40s, there is no relay, just a heavy-duty switch, usually located on the floor. The switch springs are so strong that it requires foot pressure to operate them comfortably. Starting these cars usually requires a little tap-dance to find the button, but once you've found it, you're all set.

So, sitting in a 1952 Buick Special, I was puzzled because after looking for the usual suspects, I couldn't find the starter switch.

"Push the accelerator to the floor" someone wiser than me said.

"Huh?"

"Push the accelerator to the floor. That will start it."

So I did, and the engine turned over and fired up. Buick engineers craftily linked the starter switch to the accelerator pedal. I don't know why; perhaps to make it easier to restart if you stalled?

The really old stuff, pre-1920, have hand-cranks. Hand-cranking is handy when the battery is weak, you need more exercise, or you just want to impress people. It's also a great way to injury yourself if an engine backfires: your arm and shoulder strength vs. a twenty-horsepower engine. Do the math: you're no match for a cantankerous engine.

Finally, you drive off. More fun now, especially with large American sedans of the 50s and 60s. It's really amazing how much you have to turn the steering wheel before the car actually begins to change direction! And the power brakes that are so sensitive you could stop the car with your little finger on the pedal. But, most of these cars drive pretty much like a modern car. You will see three-speed manual transmissions, or reverse hidden in odd places, or push-button automatics, or manual transmissions with shifters on the steering column. If the car is English, then the driver sits in the right-hand seat. But, at least the clutch is on the left, the brake pedal in the middle, and the accelerator on the right.

Except for the Ford Model T: transmission pedal is on the left, reverse pedal is in the middle, and the brake pedal is on the right. The accelerator (better called the throttle control) is the right paddle on the steering column. Ford started building Model Ts around 1908 when there was no standard layout, and Ford built 15 million of them, so the T really makes a standard. In fact, 15 million cars makes the Model T the second-most popular car of all time. Only VW made more cars of one model: 20 million Beetles make the Bug the most popular car ever.

Happy motoring!

Friday, August 16, 2013

Oracle Apex Tabular Forms

The Tabular Forms feature of Oracle's Apex web development tool allows users to update multiple rows in a table using only declarative programming. All of the Javascript and PL/SQL is done behind the scenes, making tabular forms a relatively easy to use and powerful tool.

After logging into your Apex environment (Oracle offers a free demo site), and navigating to your page, it's simple to create a tabular form: Create Region -> Form -> Tabular Form . Then, the tabular form wizard will guide you through the process of selecting the table and columns, choosing a primary key, choosing columns to update, and adding buttons and branching.

Run the page, and you find that the tabular form lets you scroll thru every row in the table. Pretty cool, but maybe you only want a subset of the rows? So your page has an ID field, and you modify the tabular form with the WHERE clause

select rowid,            /* Apex generates the select */
       table_key, 
       table_col1, 
       table_col2, 
       table_colx  
 where table_key = :id   /* You add the where clause */

Run the page again, and nothing shows up. No rows, no data, nothing. If the ID field of your page is blank, that's the correct response. Nothing in the table satisfies the where clause, so nothing is displayed. If you enter an ID value that's not in the table, again, nothing is displayed. If you enter an ID value that is in the table, only the matching rows are displayed. Perfect... almost.

There is an Add Row button to add new rows, and using the Add Row button is the way the tabular form is designed to work. But in some cases, we would like an empty row of cells to display if there isn't any data. The end-user may find it odd that nothing is displayed on the screen, and the blank row gives them a place to start.

Depending on what version of Apex you're using, you may be able to modify the tabular query by using a UNION to select a row of null values from dual. But this doesn't work with every release of Apex, so we'll take a different approach.

If we examine the Add Row button, we find that it fires a bit of Javascript:

javascript:apex.widget.tabular.addRow();

So, we will execute the JavaScript addRow function when the page loads by using a dynamic action. Here's how to do it: create a new dynamic action and name it Add_1_Row. Click Next. The Event should be Page Load, don't choose a Condition, and click Next. The Action should be Execute JavaScript Code, the code should be javascript:apex.widget.tabular.addRow(); , and click Next. Then click Create Dynamic Action.

Now, your end-user will always have a blank row to enter data. And if they need another blank row, the Add Row button is ready for them to use. Best of all, you don't need to worry about how to process this row of blanks because the Apex multi-row processes auto-magically handle all the inserting and updating for you. Now it's perfect!

For more information about how APEX handles commits in tabular forms, see my post discussing commits in APEX.

Friday, August 9, 2013

D'où êtes-vous?

D'où êtes-vous?

I was stumped. We were finishing breakfast at La Brioche Lyonnais in the Latin Quarter of Montreal, QC. I managed, thanks to a friendly and very patient waitress, to order my breakfast in French. When she asked what I wanted on my sandwich, and what salad dressing I wanted, I was able to answer her questions. I explained to my wife how to ask for some more water. I was actually conversing passably in French.

But now I was stumped. After 4 years of French in high school and college, I should have been able to answer her question. "Nous habitons a Nouveau Hampshire" would have been a great response. But it's been a long time since I studied French, and processing language at the speed of normal conversion is much trickier that reading it or slowly replying to questions.

This is one of those experiences that made me think about this blog's topic: things we've learned but lost over time.

The experience also reminded me that we loose knowledge over generations. My French great-grandmother would easily have answered the question that stumped me.

Another example: friends who are taking up permaculture are re-learning skills that their grandparents and great-grandparents took for granted: how to raise chickens, how to keep bees, how to grow vegetables and fend off the deer, how to run the tractor, how to build fences and farm buildings.

And another: a local auto repair shop that specializes in vintage cars was working on a 1966 Ford Mustang with dealer tags. The young mechanics at the Ford dealership, all trained to work on modern computer-controlled cars, couldn't figure out why the Mustang wouldn't start. The problem: bad points, a six-dollar tune-up part and the first thing most old-school mechanics would check.

Ending on an upbeat note: if you visit Montreal, I recommend La Brioche Lyonnaise. The smoked salmon was great, and the crêpes were everything you'd want a crêpe to be. Bon appétit!

btw... on this side of the border, Cafe Papillon in Rehobeth Beach, Delaware is a good bet.

Monday, August 5, 2013

Get Regular (Expressions)

There's been a lot of debate about Unix since it was first introduced by Bell Labs. Computer geeks engaged in their own Ford vs. Chevy debate: Unix vs. VMS, multiple Unix minicomputers vs.an IBM mainframe, System V vs. BSD, and so on. For every proponent of Unix's case-sensitive commands and file system, there were an equal number of opponents who just found it an annoying nuisance. Ah, the Holy Wars of Computerdom. It was fun stuff.

But whether your roots are in Unix, or VMS, or MVS, or any of the PC/Mac operating systems, I think most of us have found regular expressions to be one of the most useful things to come out of the Unix/Bell Labs world.

We all use simple pattern matching of one form or another. The MS-DOS DIR command uses an asterisk (*) to match any number of characters and a question mark (?) to match a single character. The SQL LIKE operator is another example. The percent sign (%) matches any number of characters, and the underscore (_) matched a single character. That's really great stuff, and it's good enough 75% of the time.

But there are some problems that simple pattern matching won't solve. Suppose I want to select rows of data from a table that had valid email_addresses. My query might look like this:

select email_address
from email_address_table
where email_address is not null;

Well, that query gets all rows that have data in email_address column. But are the data any good? Who knows? Let's try the LIKE operator:

select email_address
from email_address_table
where email_address like '%@%';

That's a bit better. At least we'll see data that looks like username@hostname. But a regular expression can do even better.

select email_address
from email_address_table
where regexp_like( email_address, 
           '^[a-zA-Z0-9\._-]+@[a-zA-Z0-9\.]+\.[a-zA-Z]+$');

OK, so the example with LIKE is simpler. But, after we pull all the data using LIKE, we need to do some post-processing ( in C, C++, Java, PL/SQL) to get the data we really wanted. This regular expression accepts that data that

Has a username before the "@" symbol. The username can be any combination of letters, numbers, periods, underscore, or dash.
Has an @ symbol separating the username from the hostname.
Has a hostname consisting of letters, numbers, and periods, and ends with a top-level domain consisting of just letters.

This is a really big improvement. Now we have a tool that works for 99% of the data.

The magic is in the regular expression metacharacters. Metacharacters are part of the regular expression grammar; this list is just a start to describe the example:

^ -- matches the beginning of the string
. -- period matches any character
[] -- describes a set of characters to match. In our example, the set of characters includes the letters from a to z and A to Z, period, underscore, and dash
+ -- count operator, specifies that the string must have one or more of the bracketed characters.
\ -- escape character. If we want to match on a metacharacter, we must escape it. The period is a metacharacter, and if we want period to match a period and nothing else, we escape it.
$ -- matches the end of the string.

And anything that's not a metacharacter matches itself. For example, the "@" symbol in our expression is not a metacharacter, so it only matches itself.

Let's look at some sample email addresses. One of these is a good email address, the others are not:

Pebbles 
Wilma_Flintstone@Bedrock
Fred_Flinstone@SlateConstruction.com

Now, lets see how the three queries do.

The first query, not null, accepts all the data. 'Pebbles' clearly does not have an email address, and Wilma's address is not complete.

The second query, using LIKE, does not accept Pebbles. Wilma's address is still a problem.

The third query, using a regular expression, accepts only Fred's address at SlateConstruction.com.

Here's a test page at Email and Phone Test for phone numbers and email addresses. The test page splits the phone number into an area code and exchange, and the page splits the email address into an username and hostname. How the page does that is another day's topic.

This is a topic that's really worth a deep dive. Mastering Regular Expressions is really deep dive into the topic. Another tip: try using a regular expression whenever you get a chance. It's a bit like playing the piano: the more you try regular expressions, the better you'll understand them.

Unfortunately, there are several versions and many implementations of regular expressions, and some implementations are better than others. The GNU/Linux egrep command and the PERL language have excellent regular expression implementations. Both are good, but they differ, so take time to read the documentation that comes with your favorite editor or text processor.

Lost Learnings