Thursday, November 20, 2014

Directory Lookups

Every organization has an employee directory, and building search functions to find someone in the directory is a common problem.  How do you look someone up?  By their last name? By their first name?  And what do we mean by last and first names?  In some countries (China, Hungary), the family name precedes the given name. Or by their full name, including a middle initial?

If an organization is small, it's easy to load all the names into a select list and let the user pick one.  For large organizations, searching the company directory is more challenging.   Is Marguerite in the directory as Marguerite, or Margaret, or maybe even Gretchen, or maybe just Marge?  And how many John Smiths and Jane Does do we have in the directory?

In fact, searching text is a challenging problem.  Consider product searches.   Go to www.ebay.com, enter a product, and you will likely get tens, perhaps hundreds of good matches.  In my experience, the eBay search engine works very well.  Other vendors do not do as well -- a search that turns up hundreds of poor matches is not very useful. 

In general, we want a search strategy that returns lots of good matches and very few poor matches.   In this post, we'll use the company directory as an example of how to solve this problem.

So, suppose someone sits at a web page, types in a name, and presses Enter.  How do we select a list of likely names to present?   Most organizations have tried a few approaches.  Soundex looks promising, but in practice tends to return many false positives.  Many organizations home-grow a solution:  only look at the first few characters of each name, perhaps drop all the vowels and squish everything together, or maybe swap first and last names.

Most home-grown solutions suffer from not returning enough likely matches and returning too many bad matches  --- the worst of all situations.  Worse, many organizations invest much time and effort home-growing the poor-performing solution. There are a lot of solutions out there, but rather than re-invent the wheel, let's examine a solution using Oracle's text indexes and fuzzy matching.  We can get good results by using the out-of-box tools provided in the Oracle database.

Let's start with a simple table and some data.  First we'll create a table to hold the names, split into a first name, last name, nickname, and finally all the names concatenated together. More about the all_names column later.

create table names (
    first_name varchar2(50),
    last_name varchar2(50),
    nick_name varchar2(50),
    all_names varchar2(160)
);

The first_name (John), last_name (Smith), and nick_name (Johnny) are entered as part of normal table maintenance.  The all_names columns is a concatenation of the three names, and we'll use a trigger to maintain the all_names column:

create or replace trigger names_before_iu
before insert or update on names
for each row
begin
   :new.all_names := :new.first_name || ' ' ||
                     :new.last_name || ' ' ||
                     :new.nick_name ;
end;

Next, let's load some test data. We'll use the system catalog to get some "names", with the object's name acting as a first_name and the object's type acting as a last_name. In addition, we'll use the Apex demo_customers table as a source of names:

insert into names (first_name, last_name)
select object_name, object_type
from user_objects;

insert into names (first_name, last_name)
select cust_first_name, cust_last_name
from demo_customers;

Our next step is creating an index on the table. We will index the all_names column using a ctxsys.context type of index:

create index names_ctx
    on names ( all_names)
    indextype is ctxsys.context
    parameters('sync(on commit)');

We will use this context index to do fuzzy matching. Our queries will use the contains function in the where clause, and we need to construct a fuzzy argument to pass to the contains function.  The fuzzy function expands the search to include similarly spelled words.  The fuzzy argument takes three parameters:
  • a search string that is at least 3 characters long
  • a number between 1 and  80 specifying the lowest similarity score we will accept.  A lower number returns more results, a higher number returns better matches.
  • a number specifying how many variations of the search string to use. The more variations we request, the more results we will see.
  • a string specifying whether to weight or noweight the results
These numbers will become clear after a few examples. Sometimes things become clear when you poke them with a stick and see what happens, and this is one of those times.  The fuzzy match argument is a bit tricky to type, so let's create a function to make our typing easier.

create or replace
function mk_fuzzy(p_name in varchar2,
                  p_similarity in number default 50,
                  p_n_variations in number default 10)
    return varchar2 is
    
begin

   return 'fuzzy(' || p_name ||  ',' ||
           p_similarity || ',' ||
           p_n_variations || ',W)';

end mk_fuzzy;

Now for our query.

select score(1), a.first_name, a.last_name  
from names a
where contains(all_names, mk_fuzzy('viw',1,3), 1)>= 0
order by 1 desc;

Let's start at the top.  First, the score function takes one argument, in this case the number 1.   The argument refers to the third argument in the contains function, the number 1.  These two numbers should match, and the score function returns the score from the contains function for each row returned by the query.  Below we will use the contains function twice, and the score index ties the score to a specific contains query.

Next, the contains function takes three parameters:
  • the column to search on 
  • a fuzzy argument.  In this case, we're looking for the word "view" misspelled as "viw".  The similarity score argument of 1 specifies to return any match, and the variation argument specifies to use three variations of the "viw" argument.
  • an index for the score function
 Let's run the query and examine our results:

SCORE(1) FIRST_NAME LAST_NAME
48V1VIEW
48V2VIEW
48V3VIEW
48V4VIEW
48V_EMPVIEW
8DEMO_ORDERS_BIUTRIGGER
8DEMO_ORDER_ITEMS_BITRIGGER
8DEMO_ORDER_ITEMS_BIU_GET_PRICETRIGGER
8DEMO_PRODUCT_INFO_BIUTRIGGER
8DEMO_CUSTOMERS_BIUTRIGGER
8DEMO_TAGS_BIUTRIGGER
0WEB_CLIENT2PROCEDURE
0WEB_CLIENTPROCEDURE
0TESTTABLE_SEQSEQUENCE


The scores range from 0 (no match) to 48 (poor match). 48 is not a great score, but the query does find all of our views.

The all_names column includes both first and last names.  After some trial and error against a real-life table of several thousand names, the following query produces a good combination of  relevant results:

select score(1), a.first_name,       
       score(2), a.last_name,       
       score(1) + score(2)
from names a
where contains(all_names, mk_fuzzy('viw',50,50), 1) >  0
  and contains(all_names, mk_fuzzy('emp',50,50),2) > 0 
order by 5 desc;

This query returns the each name, the score for the each name, and the sum of both scores.  Notice the score indexes:  score(1) returns the score from the first use of contains, and score(2) returns the score from the second use of contains.  Ordering the results by the sum of the scores gives us the best matches first, and it's agnostic as to whether the family names follow given names or not. And because our all_names column includes nick names, our query works well with an argument consisting of a nick name and a family name, too.

Against our sample table, the above query returns the following results:

SCORE(1)FIRST_NAMESCORE(2)LAST_NAMETOTAL_SCORE
52V_EMP76VIEW128

With both our name arguments misspelled ( "emp" instead of V_EMP, "viw" instead of VIEW), the query still returns the correct answer, and the query doesn't return any bad matches.   This is exactly what we want!

The Oracle fuzzy matching tools deliver high quality search results with a minimum of new development and much less future maintenance.  In addition, tweaking the similarity score and the term expansion in the fuzzy matching function makes the tool very flexible.   If we don't get enough results, or if the results are poor, tweaking the arguments will solve the problem, and it's much easier to tweak the two arguments than re-writing home-grown code.


Sunday, October 5, 2014

Replace Or Repair?

Welcome to the new look!  It's fall in New England, the trees are turning many colors, and we have some of the prettiest scenery of the year.   It seems fitting to share this view of Mount Kearsarge from Kezar Lake.  In a few months, the lake will be iced over and the hills will be bare.

Today's topic is a common problem: we're surrounded by appliances (everything from a mobile phone to a refrigerator to an automobile is just an appliance), and sooner or later, the appliance isn't working or isn't working well enough. Do we replace it, or do we repair it?

Case 1: HTC Eris droid phone. This is a nice little android phone, now about 5 years old and still working, using a better set of widget than most newer phones, but only capable of running Android version 2.2. Unfortunately many of my favorite apps no longer work, and app vendors are not releasing updated versions for Android 2.2 phones.   My wife just bought a new Android phone, and I will replace my HTC Eris with her Razr Maxx.

Case 2:  Kodak C875 camera.  The C875 was Kodak's top-of-the-line point and shoot digital camera seven years ago.   It suffered a drop on the driveway when it was two years old, but it continued to work for another five years.  But the camera quit working two days before vacation, and there was no time to get it repaired.  I purchased a replacement camera with better zoom, a faster processor, and image stabilization for about half the cost of the Kodak.

What to do with the old camera?  On one hand, it was a top-of-the-line camera.  On the other hand, cameras have gotten much better and less expensive. So, the camera sat forgotten in a desk drawer for several months, until I figured it was time to repair it or get rid of it.  I quickly learned that the large service centers were no longer repairing this particular model.  But I found one on ebay,  Gerald's Camera Service, that would repair the camera for $39 with a 30-day warrantee.  For $39, I will have a better camera if I get the Kodak fixed than if I buy a new camera.

$39 later, I have my old camera back, working perfectly, just in time to take the panoramic view of Mount Kearsarge.   The new camera doesn't take panoramic shots, a nice feature of the old Kodak.

In fact, there are many things to admire about old appliances.  Our other camera is an old film Pentax K-1000.  The K-1000 is a beautifully simple camera.  There are only four controls:  set the film speed, set the shutter speed, set the f-stop, and focus on the subject.   The controls are easy to learn, and the photographer can set them more quickly than the PASM/C controls on a digital camera.

Simplicity is a big advantage of older appliances.  Last week I had my Austin-Healey out late.  Fans of old British cars know the Lucas moniker "A gentleman does not drive after dark".  But I was out late attending a Linux group meeting, and I found that my headlight switch would not work.  In a modern car, the stranded motorist would call AAA for a tow to the dealership.  In my 50-year old car,  I connected the headlights through the front-panel switch, and I was on the road in 20 minutes.   In a modern car, the rescued motorist would face a thousand-dollar repair bill to replace the combination switch.  In my 50-year old car, I will disassemble the switch and fix it for no cost.  If I can't fix the switch, I will replace it for $40.

Sometimes it's time to say good-bye to the old appliance.  The HTC Eris will sit in desk drawer, like a spare tire, in case I drop or lose the Razr Maxx.  Sometimes, the repair cost is reasonable and the appliance is worth repairing.  I expect to get another 5 years of use from the Kodak camera. And sometimes the simplicity of an older appliance makes them easier to use, and easier and less expensive to repair.  

Update

I spent half an hour removing the headlight switch, disassembling it, lubricating it, re-assembling it, and re-installing it in my car. I expect the switch will function for another 50 years.


Friday, June 6, 2014

National Donut Day

The best programmers are not fueled by intelligence, nor inquisitiveness, nor a desire to learn new skills.   The best programmers are fueled by two things:  coffee and donuts.

Today, June 6, is National Donut Day.  With me today are two things that have fueled my career: a cup of coffee and a hot donut.  The coffee is a cup of Green Mountain's Double Diamond, an intensely-flavored and somewhat bitter roast that's regrettably only available in a K-cup.  The donut is fresh from Lou's in Hanover, NH. Still warm from the oven, it's dripping with that most quintessential of New England toppings, a maple sugar glaze.

Are you drooling yet?

As warm and tasty as this donut is, my best recommendation is to take a trip up Route 1 in Maine, and visit Frosty's on Main Street in Brunswick, Maine.  I believe these donuts to be the best on the planet.  Reviewers on yelp agree; the only complaint is the demand is high, and they run out of donuts early. 

There's lots to see and do on the Maine coast, and a good donut just adds to the pleasure.  Bon appétit!





Wednesday, April 2, 2014

Windows XP: The Sky Is Not Falling

On April 8, 2014, Microsoft officially brings the Windows XP era to a close.  After that date, Microsoft will no longer offer support and security fixes for its 14 year-old operating system.

Reactions to Microsoft's announcement have been mixed.  Predictably, there are lots of Chicken Little reactions:  Windows XP is not secure, security updates won't be available, Windows XP is a risk to the whole organization, the sky is falling!

Well, the good news is that the sky is not falling.   Many users already bought new computers with Windows 7 or Windows 8.  But, if you want to continue to use Windows XP, there are a few things that you should do.

Let's start with Windows security model.  The problem is not the Windows XP security model, the problem is the way we choose to use Windows XP.    Following on the heels of Windows 95/98/ME, many of us use Windows XP as if it was Windows 95.  We take advantage of the convenient features, and we choose convenience over security.  

Windows 95 and its successors were really just graphical user interfaces sitting on top of a single-user operating system.  Microsoft built Windows XP on top of the Windows NT, so the Windows XP security model shares similarities with secure mainframe and workstation operating systems.   If we use Windows XP more like a workstation and less like Windows 95, then we can use Windows XP securely.  Our computer may not be as convenient to use, but it will be more secure.

Use passwords:  Passwords and log in screens were available in Windows 95, but most users didn't bother using a password:  you turned on the computer, and you were automatically logged on.  Windows XP offered the same convenience of automatic log in.  Don't do it!   Always use a password, and pick a secure password.

Use the NTFS file system: Windows XP inherited both the FAT32 file system from Windows 95 and the NTFS file system from Windows NT.  Early Windows XP computers were usually shipped with the FAT32 file system; conversion to NTFS was an option.  The  FAT32 file system has no security built in; any user can read or write or delete files anywhere on the disk.  The NTFS file system can be secured, and a user must have authorization to read, write, or delete files.   In the FAT32 world, it's easy to corrupt the /Windows directory (or any other directory); in the NTFS world, an unauthorized user can not corrupt the /windows directory.  If you're still running a FAT32 file system, convert it to NTFS.

Use the Limited User and Computer Administrator account types. This is another very important security feature that Windows XP inherited  from Windows NT.  Mainframe and Unix workstation users are familiar with the idea of granting certain users permission to update the operating system and restricting access to other users.  Windows XP has this capability, too.  Windows XP Home users are either Computer Administrators or Limited Users.   An Administrator can install programs, and an administrator has permission to read/write to any directory in an NTFS file system.  A Limited user can not install software, and the NTFS file systems limits the disk access.

This is a secure way to manage a computer,  yet most XP users do not take advantage of it.  Instead, we always log in as an Administrator, and we use our XP system as if it was Windows 95.  Do not do this!   The only good reason to log in as an Administrator is to install software, run the defragmenter, or perform other system administration tasks. For browsing the web, word-processing, working with spreadsheets,  etc, create a Limited user account, and use the Limited user account for daily work.  Never do your day to day work as a Computer Administrator.

Unfortunately, some software vendors insisted on writing applications that saved data to a sub-directory of the \Windows directory.  This was a bad practice, but it was not uncommon.  If you have an application that does this, you may be able to use the application's options to change where it saves files.  If that does not work, you can run the application as an Administrator.  That does defeat the security, but sometimes it's the only way to run an older application.

Disable the guest account.  The Unix workstations we used back in the 1980s and 1990s usually had a "guest" account.  Guest accounts were intended for use by friendly guests who could log in with out a password.  It was a convenience feature that quickly caused us more work than it saved us.  Guest accounts became back doors to our systems for unfriendly hackers.  Windows XP ships with a guest account; it's an unlocked back door to your computer, so be sure this feature is disabled.

Update Windows: As of this writing, there are a few days left to run Windows update.  Microsoft tightened up the XP security in XP releases SP2 and SP3.  Be sure you are running SP3 with the latest updates.  Run Windows update, upgrade to SP3 if you need to, and get the last few updates.  Although Microsoft won't be updating Windows after April 8th, the good news is that after fourteen years, the hackers are unlikely to discover a serious bug.

To the cloud and beyond: Early XP computers were usually shipped with 40GB or perhaps 60GB disks.  After years of use, many of us are running out of disk space.  Or the applications that came with our old computer have become dated.  This is a good opportunity to investigate cloud services. With a Microsoft live.com account or a google account, we have access to many GB of free storage and a suite of applications for word processing, email, spreadsheets,  blogging, etc, all delivered via a web browser.  We get up-to-date applications delivered via the web, we get free disk storage, and the cloud service does all the maintenance.  What's not to like? Let's be honest:  when was the last time you backed up your computer?

Use a modern browser:  That brings us to our next recommendation:  upgrade your browser.   Install the latest FireFox or Chrome or Internet Explorer browser, and let these applications keep themselves updated.  And remember to install the software using your Computer Administrator account.

Security Suite: Another good practise is to run some security software.  In addition to free security suites such as AVG, some Internet service providers offer free subscriptions to popular security suites. Comcast offers Norton anti-virus free to its customers.  Also, check the security software license at your place of business.  With many office workers telecommuting from home, businesses have invested in licenses that allow employees to install the same security suite on their home computers that they use at work.  Another option is Microsoft Security Essentials, available through Windows Update.  Run Windows update manually, click the Custom button (not the Express button), scroll through the list of optional updates, check the Microsoft Security Essentials, and finally click on Review and install updates.

Summary

Windows XP is not an insecure operating system; however, many of us use XP insecurely.  By taking advantage of XP security features, we can continue to safely use our XP computer.  
  1. Create passwords for your users.
  2. Convert to NTFS.  FAT32 is not secure.
  3. Use the Computer Administrator account only for computer administration tasks.
  4. Use a Limited user account for everything else. 
  5. Disable the guest account.
  6. If you share a computer at home, everyone using it should have a unique Limited user account
  7. You have a few days left; be sure you have all the available XP updates applied.
  8. Upgrade to a modern browser. Chrome and FireFox will update themselves.
  9. Add a security suite. 


Wednesday, March 12, 2014

SQL Server 2014 Install Experience

I recently starting working on a new project using Microsoft SQL Server.  I'm not new to relational databases, but I am new to SQL Server.   So, I'm going to do what any geek would do:  install it myself, poke with it a stick, and see what it does.  Let's get started.

First, google "microsoft download sql server".  I suppose I should use bing, but "bing" is not a verb, at least not yet.  From the google search results, select the SQL Server Express Edition - Microsoft link.

The Microsoft Express Edition page includes a download link for 2012 Express and a link to try SQL Server 2014.  There's a lot of experience on the Internet installing SQL Server 2012 or 2008, so let's try the 2104 version.  Click on Try SQL Server 2014 Express .  (If you're still running Windows XP, you will need to upgrade your Windows operating system or install SQL Server 2008.)

The next page describes five downloads in detail:
  • LocalDB (SqlLocalDB)
  • Express (SQLEXPR)
  • Express with Tools (SQLEXPRWT)
  • SQL Server Management Studio Express (SQLManagementStudio)
  • Express with Advanced Services (SQLEXPRADV)
We will use the last one -- it includes the database, the tools, and SQL Server Management Studio.

Click the Get Started Now button.

Sign in with you live.com or outlook.com id.  Complete the registration page, select Express with Advanced Services radio button, select 32/64 bit,  and select your language.

The download starts automagically.  If you don't have the Akamai Netsession installer, you will be prompted to install it. Click download to install it; after the installer is done, the download continues.

After the download is complete, find the file and execute it.  In my case, I'm running SQLEXPRADV_x64_ENU.exe.  The file is over 1GB in size, so it will take a moment to load.  Accept the default directory to load files, and get a coffee.  After a few minutes, the SS Installation Center opens.  Select the first option (assuming you don't have an older version), then accept the license agreement.




The setup continues, finally stopping at the Feature Selection screen.  Accept the defaults, which includes everything but the LocalDB.   Click Next. 




At the Instance Configuration, again take the defaults, noting the names: SQLExpress is the Named instance and SQLEXPRESS is the Instance ID.  Notice that the SQL Server directory and Reporting Services directory are both versioned 12, not 14.  These cannot be changed, but what's in a name?  Just click Next to continue.




At the Server Configuration Screen, set up the accounts.   Again, note and take the defaults.  Click next:




The Database Engine has four tabs: Server Configuration, Data Directories, User Instances, and FILESTREAM.   My server is just for learning purposes, so I'll use the default Windows authentication mode. 

The Reporting Services Configuration window has radio buttons Install and configure, or Install only.  Again, take the default Install and configure, and click next.

Now the Installation Progress window displays while the install completes.  This will take a while, grab a coffee and a donut. 


Finally, installation is complete.  On my older computer, installation took about half an hour.  But, the installer does all the work and the installation is easy. The Complete install screen lists some additional resources.  Note that unlike earlier SQL Server 2005 and 2008, the Microsoft SQL Server books are now online:
Now I'm ready to take a dive into SQL Server with even more resources at the SQL Server 2014 forums on MSDN, the training courses at www.lynda.com, or perhaps even a paper book.