Monday, August 26, 2013

Translate and Replace and Replace and Replace and ...

One task many programmers face is examining a string of characters and replacing specific characters with different ones.  Every COBOL programmer is familiar with the INSPECT verb, and if they've really been programming COBOL for a while, they're familiar with the now-obsolete EXAMINE verb, too. We're not going to step that far back; instead we'll look at a couple ways of handling this problem in Oracle's PL/SQL programming language.

Oracle's PL/SQL function library includes several functions that replace single characters or multiple characters with a new characters or string of characters:
Regexp_replace and its siblings regexp_like and regexp_substr are worthy of their own post, so we'll focus on the translate and replace functions.

Translate is the simplest function. Translate performs single characters substitution.  There are three arguments:  the input string we want to examine, a string of characters that we want to replace, and a new string of characters that we want to use as replacements.  Here's a simple example:

select 'abcdefgh' input,
       translate('abcdefgh','bdfh','BDFH') output
from dual;

INPUT       OUTPUT
--------    --------
abcdefgh    aBcDeFgH


We see that every occurrence of b, d, f, or h was replaced by the upper case equivalent.  Every character in the second argument is replaced by its equivalent character in the third argument.

This is a really easy way to do one-to-one character translations.  Sometimes, we want to replace a single character by a string of characters, or perhaps replace a string of characters by a single character.  The replace function handles this nicely.   Here's an example; anyone needing to translate multi-byte characters to a single-byte character is familiar with this problem:

select '10 ≥ 5' input,
       replace('10 ≥ 5','','>=') output
from dual;

INPUT       OUTPUT
--------    --------
10   5     10  >= 5

This is pretty handy.  We translated the single character greater than or equal to symbol into a two-character string that we can represent in a 7-bit ASCII codeset.

The replace function can convert strings into single characters, too.  Here's another example:

select '(c)2013' input, 
       replace('(c)2013','(c)','©') output 
from dual;

INPUT       OUTPUT
--------    --------
(c)2013     ©2013 

This is pretty handy, too.  We can take a string encoded in 7-bit ASCII and replace character strings like (c) and >= with their single-character equivalents in a multi-byte character set.  

 There's just one thing missing:  Translate can change many different characters for us in one call, but it does not handle character string replacements.  Replace can change characters strings, but only one string at a time.  If we want to change both the © and the  ≥, we need to do something like this:

select '© ≥' input,
       replace(replace('© ≥','©','(c)),'≥','>=') output
from dual;

INPUT    OUTPUT
-----    ------
© ≥      (C) >=

This works fine for a couple of translations, but for more than a few translations, this is pretty cumbersome. Why not let the computer do it for us? We'll define a new function, called MREPLACE, that will handle multiple replace strings. The first argument will be the string we want to examine. The following arguments are pairs of strings: first the old string, then the new string. Here's an example of MREPLACE in use:

select '© ≥' input,
       mreplace('© ≥',
                '©','(c)',
                '≥','>=') output
from dual;

INPUT OUTPUT                                                                     
----- ------
© ≥  (c) >= 

Perfect! Now, with just one function call, we have an easy way to replace multi-character or single-character strings with multi-character or single-character strings in one function call. Here's the source:

create or replace
function mreplace (
    p_string in varchar2, 
    p_from1 in varchar2 default null, p_to1 in varchar2 default null,
    p_from2 in varchar2 default null, p_to2 in varchar2 default null,
    p_from3 in varchar2 default null, p_to3 in varchar2 default null,
    p_from4 in varchar2 default null, p_to4 in varchar2 default null,
    p_from5 in varchar2 default null, p_to5 in varchar2 default null,
    p_from6 in varchar2 default null, p_to6 in varchar2 default null,
    p_from7 in varchar2 default null, p_to7 in varchar2 default null,
    p_from8 in varchar2 default null, p_to8 in varchar2 default null 
                  ) return varchar2 as

begin 

  if ( p_from1 is null ) then return p_string;
  else
     return mreplace(replace(p_string,p_from1, p_to1),
                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 );
  end if;           

end mreplace;

How does it work? MREPLACE uses recursion, repeatedly calling itself. MREPLACE keeps calling itself until the p_from1 argument is null. The test to stop the recursion is important if you don't want your DBAs and sysadmins darkening your office door! Also, notice that MREPLACE accepts one string to examine and eight from/to pairs. When we invoke it recursively, we call it with one string to examine that includes our string and the first from/to pair:

          replace(p_string, p_from1, p_to1)

but only seven from/to pairs:

                     p_from2 , p_to2 ,
                     p_from3 , p_to3 ,
                     p_from4 , p_to4 ,
                     p_from5 , p_to5 ,
                     p_from6 , p_to6 ,
                     p_from7 , p_to7 ,
                     p_from8 , p_to8 )

So, each time MREPLACE calls itself, it shifts all the arguments to the left. The eigth from/to pair are not specified and defaults to null. When p_from1 is null, then all the arguments are used, all the substitutions are completed, and MREPLACE returns the answer.

For the purposes of illustration, MREPLACE accepts up to 8 from and to pairs, but in practice we would set it to accept more, maybe up to 32 from/to pairs, or 64 from/to pairs or even more if required.  Remember, MREPLACE stops the recursion when it finds the first null p_from1 argument, so you should define it to accept more from/to substitution pairs than you expect to use.

Happy computing!

No comments:

Post a Comment