Lost Learnings: web client

In an earlier post, A Simple Web Client, I demonstrated a simple program for pulling data from a download website to an Oracle table using the htp package. A few years later, I needed to do a similar data pull to a SQL Server table. In this post, I'll demonstrate a simple way of pulling data from an URL into a SQL Server table.

In an ideal world, languages and packages would be standard enough that one could readily port a stored procedure from one vendor's database and reuse the code without too much trouble in another vendor's database. While most SQL is pretty transportable, the stored procedure languages are highly proprietary and do not port at all from one vendor to another ( one exception: IBM's support of Oracle's PL/SQL language).

The SQL Server solution takes a different approach from the Oracle PL/SQL solution. In the latter, an Oracle PL/SQL procedure used built-in Oracle packages to open and manage an HTTP connection, to read data from the HTTP socket, and to write the data to the console. In practice, instead of writing the data to the console, we might have processed the data directly or loaded the data to a staging table for later processing. The SQL Server solution is different; rather than use T-SQL to manage the HTTP connection, this solution uses an external program to read the data from the web into a file on a Windows Server before bulk importing the data into a SQL Server staging table. After we load the data to a staging table, we can process the data as needed by our application.

Let's look at the SQL Server solution. First, we'll use the data from the Federal Reserve from the previous post again. Put the URL https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package into your browser. When your browser opens the URL, you'll get a CSV file like the following:

We have a four-column spreadsheet, with some meta-data in rows 1 through 5, the column headers in line 6, and the data starts in row 7. We'll create a four-column table to store the results. In a SQL Server Management Studio (SSMS) window, create the following table:

use MyDatabase
go

create table stageFedRates (
 timePeriod date,
 jrxwtfb_nb dec(15,8),
 jrxwtfn_nb dec(15,8),
 jrxwtfo_nb dec(15,8)
 )
 ;

go

We have a couple of choices of programs to pull data from the web into an external file. One option is wget, another is curl. Curl is already installed on my Windows 10 computer and my Windows 2008 Server, so we'll use curl. The syntax is pretty simple, just type curl and the URL enclosed in double quotes. Don't forget the double quotes -- the URL is full of special characters that command console will try to interpret if not quoted. The output goes to STDOUT by default. Let's try it in a command shell window:

C:\> curl "https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package"

"Series Description","Nominal Broad Dollar Index ","Nominal Major Currencies Dollar Index ","Nominal Other Important Trading Partners Dollar Index "
"Unit:","Index:_1973_Mar_100","Index:_1973_Mar_100","Index:_1997_Jan_100"
"Multiplier:","1","1","1"
"Currency:","NA","NA","NA"
"Unique Identifier: ","H10/H10/JRXWTFB_N.B","H10/H10/JRXWTFN_N.B","H10/H10/JRXWTFO_N.B"
"Time Period","JRXWTFB_N.B","JRXWTFN_N.B","JRXWTFO_N.B"
2018-07-09,123.4946,89.6613,160.2890
2018-07-10,123.4637,89.7541,160.0923
2018-07-11,123.8276,89.8616,160.7776
2018-07-12,123.7232,90.1620,160.1318
2018-07-13,124.0710,90.2872,160.7554
2018-07-16,123.8317,90.0232,160.5674
2018-07-17,124.1553,90.4107,160.7806
2018-07-18,124.3573,90.5535,161.0480
2018-07-19,125.0530,90.9072,162.1562
2018-07-20,125.0484,90.9072,162.1456
c:\>

This is the same data we saw in the Excel spreadsheet above. Two useful options of the curl command are the --stderr option and the --output option. Those two options specify files to store STDERR and the output file:

C:\>curl --stderr curlErrors.txt --output curlData.csv "https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package"

C:\>dir curl*.*
 Volume in drive C has no label.
 Volume Serial Number is 1836-25CF

 Directory of C:\

2018-07-28  03:38 PM               804 curlData.csv
2018-07-28  03:38 PM               399 curlErrors.txt
               2 File(s)          1,203 bytes
               0 Dir(s)  352,925,888,512 bytes free

C:\> type curlErrors.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   804    0   804    0     0    804      0 --:--:-- --:--:-- --:--:--  2233

Now that we can pull data from our source, let's look at loading into a SQL Server table. From a new SSMS window, use the bulk import command to load the table:

use MyDatabase
go

Bulk Insert stageFedRates 
   from  'C:\curlData.csv ' 
   with ( fieldterminator = ',',   
          rowTerminator = '\n',  
          firstRow = 7,
          maxErrors = 999999999 )

The fieldTerminator and rowTerminator options specify that the input is a CSV with rows delimited by new line characters and the columns by commas. We know there is some metadata in the download, so we specify firstRow=7 to start loading data. Finally, maxErrors specifies how many errors the import will tolerate before aborting the load. This will depend on the data; if the data is very clean, specify a small number of errors, but if not, specify a large number so all the data loads into the table for post-load cleanup.

In a SSMS window, select all the data and see what we imported into the SS table:

Now we're able to pull data from the web into a file and load that data into a table. Next, we'll create a stored procedure to automate this process. We've demonstrated all the commands we need to do this, but there's one more hurdle to leap: working within the context of the Windows file system permissions. The directories that we can read/write are not necessarily the directories that SQL Server service can read/write.

To work around this, we will query SQL Server's TMP environmental variable, and we will use that directory to write to and read from. The following stored procedure will do the trick. The comments will explain how it works along way. It's worth leaving the PRINT statements in the code for debugging; getting all the quotes and double-quotes correct can be quite tricky.

use MyDatabase
go

create procedure [dbo].[loadFedReserve] AS
begin
    -- Get SQL Server's tmp directory.    
    declare @destinationDirectory varchar(256);
    declare @cmd varchar(256) = 'echo %TMP%'; 

    -- Create a temporary table 
    create table #tmpResult (
 envVar varchar(256)
 )
    -- Store the results of executing "echo %TMP%" on the Windows server
    -- into the temporary table
    insert into #tmpResult exec xp_cmdshell @cmd 

    -- Now get the TMP directory from the table
    set @destinationDirectory = ( select top 1 envVar from #tmpResult);
 
    -- Build the command line, including the output and stderr file locations.
    -- We will reuse the @destination variable when we import the file to SQL Server.
    -- Be sure to use the double quotes around the URL  
    declare @curlCommand varchar(512) = 'curl ';
    declare @destination varchar(128) =  @destinationDirectory +'\curlData.csv ';
    declare @errors varchar(128) = @destinationDirectory + '\curlErrors.txt" ';
    set @curlCommand = @curlCommand + '--stderr "' + @errors
    set @curlCommand = @curlCommand + ' --output ' + @destination 
    set @curlCommand = @curlCommand + '"https://www.federalreserve.gov/datadownload/Output.aspx?' +  
       'rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10' + 
       '&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package"'
  
    -- Little bit of debugging, then execute the command
    print  @curlCommand; 
    exec xp_cmdshell @curlCommand
 
    -- The table is usually a staging table that we reuse, so delete any data in it.
    delete from stageFedRates; 

    -- Now build a sql statement with the bulk import command and the file locations
    -- that we determened before
    declare @sqlStmt nvarchar(256) = 'Bulk Insert stageFedRates from  ''' + 
  @destination + ''' with ( fieldterminator = '','', rowTerminator = ''\n'',  firstRow = 7,maxErrors = 999999999 )'
    
    -- A little more debugging, then execute the load.  Building the SQL command and 
    -- executing it with sp_executeSQL let's us change the imported file programmatically. 
    print @sqlStmt;
    exec sp_executeSQL @sqlStmt
 
    select count(*) RowsImported from fedRates
END

GO

Depending on how your SQL Server instance is configured, you may need to enable the use of xp_cmdshell. I can execute the following on the instance of SQL Server on my PC; the help of a DBA is probably required for an enterprise server.

exec sp_configure 'show advanced options', 1;  
go  
-- Refresh the configuration 
reconfigure ;
go 
  
-- Enable xp_cmdshell   
exec sp_configure 'xp_cmdshell', 1;  
go
  
-- Refresh the configuration 
reconfigure ;
go

After compiling the store procedure, open a new SSMS window, execute the stored procedure, and select the results of the load:

exec loadFedReserve

select * 
from stageFedRates

There should be 10 rows, like below, and it will take SQL Server less than a second to pull the data and load the table:

Data returned from a SQL Select in SQL Server Management Studio

Using the curl command and a bit of code in a stored procedure, we can pull external data from the web and load it into SQL Server. After loading the data into the staging table, we can manipulate the data as needed by our business requirements. The load is fast enough to be useful for interactive processing with small amounts of data, or we can use the SQL Server Scheduler to batch load large amounts of data.

Cloud computing and software as a service providers have lots of capacity and attractive rates, so many organizations are taking their applications to the cloud. After an application becomes cloud-based, many users need to pull some of the data to feed to other systems. In this post, we'll look at a simple PL/SQL web client that connects to a web service and downloads data.

For our data, we'll use the Data Download Program at the Federal Reserve Bank. There's a lot of statistical data on the Federal Reserve site, and we will use that data as the source for our simple web client. The data files are small and free to download, so the Bank gives us a good source of data to use.

From the Data Download Program we click on the Foreign Exchange Rates (G.5 / H.10) link, then we get a page that lets us choose which package we would like. The H10 radio button is selected by default, so clicking the Download button takes us to the download page. We could download a CSV into our browser and save the result as an OpenOffice or Excel spreadsheet. But for production purposes, following these links and downloading data through a web browser is a tedious chore. Notice the Direct Download for Automated Systems link? We'll copy that link and use it for our demonstration.

create or replace procedure simple_web_client as
    
  timeout number := 60;  -- timeout in seconds

  /* Below is the URL from the Direct Download for Automated Systems.  Just copy the link
     and paste the text.
  */
  v_url varchar2(1024) :=
  'http://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package';
   
  v_line_count number := 0;
  v_line_in varchar2(1024);
  v_header varchar2(1024);

  /* UTL_HTTP.REQ is set by the BEGIN_REQUEST function.  REQ includes 
     the URL, method, and http_version.
  */
  req   UTL_HTTP.REQ;

  /* UTL_HTTP.RESP is set by the BEGIN_RESPONSE function.  RESP includes   
     the status, a brief message, and the http version.
  */
  resp  UTL_HTTP.RESP;
    
begin
    
  utl_http.set_transfer_timeout(timeout);         -- set the timeout before opening the request
  req := utl_http.begin_request(v_url);           -- Pass our URL, get a req object back
  
  utl_http.set_header(req, 'User-Agent', 'Mozilla/4.0');

  resp := utl_http.get_response(req);             -- Pass our req object, get a resp object back
  utl_http.read_line(resp, v_header, true);       -- Read line of resp into v_header

  dbms_output.put_line('CSV Header = ' || v_header);
  v_line_count := v_line_count + 1;

  loop

    utl_http.read_line(resp, v_line_in, true);    -- csv data
    v_line_count := v_line_count + 1;
    dbms_output.put_line('CSV Data = ' || v_line_in);

  end loop;

exception
  when utl_http.end_of_body then                  -- end of the data
      dbms_output.put_line('Lines read, including header = ' || v_line_count );
      util_http.end_response(resp);               -- close the response object

  when others then
    dbms_output.put_line('Unhandled exception, lines = ' || v_line_count  );
    dbms_output.put_line(sqlerrm);
    dbms_output.put_line(dbms_utility.format_error_backtrace);
 
    -- Do not leave anything open.  Close everything and ignore errors.
    begin
       utl_http.end_response(resp);
       utl_http.end_request(req);
    exception
       when others then null;
    end;
    
end simple_web_client;

Now we execute our client using SQLPLUS or SQL/Developer. If we connect properly, our output will look something like this:

anonymous block completed
CSV Header = "Series Description","Nominal Broad Dollar Index ","Nominal Major Currencies Dollar Index ","Nominal Other Important Trading Partners Dollar Index "
CSV Data = "Unit:","Index:_1973_Mar_100","Index:_1973_Mar_100","Index:_1997_Jan_100"
CSV Data = "Multiplier:","1","1","1"
CSV Data = "Currency:","NA","NA","NA"
CSV Data = "Unique Identifier: ","H10/H10/JRXWTFB_N.B","H10/H10/JRXWTFN_N.B","H10/H10/JRXWTFO_N.B"
CSV Data = "Time Period","JRXWTFB_N.B","JRXWTFN_N.B","JRXWTFO_N.B"
CSV Data = 2013-10-14,ND,ND,ND
CSV Data = 2013-10-15,101.1463,75.7635,128.2686
CSV Data = 2013-10-16,101.0694,75.7839,128.0691
CSV Data = 2013-10-17,100.4167,74.9717,127.6647
CSV Data = 2013-10-18,100.3842,74.8882,127.7015
CSV Data = 2013-10-21,100.6396,75.0142,128.1114
CSV Data = 2013-10-22,100.3595,74.7093,127.8816
CSV Data = 2013-10-23,100.4550,74.8150,127.9575
CSV Data = 2013-10-24,100.5485,74.8527,128.1189
CSV Data = 2013-10-25,100.5134,74.9733,127.8807
Lines read, including header = 16

The Oracle utl_http packages makes reading data from a webserver as simple as reading data from a file. With a simple web client, we can read any data that's available from a web server. Our web service example used CSV-formatted data, but we could download XML data as easily. And for a real hands-free operation, schedule your client using DBMS_SCHEDULE or DBMS_JOB.

There's just one caveat when using the utl_http functions: depending on your DBAs and your installation, you may not have access to use the network functions. If you get an error like

ORA-24247: network access denied by access control list (ACL)

then it's time to visit the DBAs and request that they add your schema or your user account to the network ACL.

Lost Learnings

Saturday, July 28, 2018

Simple Web Client, the Sequel

Wednesday, December 4, 2013

A Simple Web Client

Labels

About the Author