In an ideal world, languages and packages would be standard enough that one could readily port a stored procedure from one vendor's database and reuse the code without too much trouble in another vendor's database. While most SQL is pretty transportable, the stored procedure languages are highly proprietary and do not port at all from one vendor to another ( one exception: IBM's support of Oracle's PL/SQL language).
The SQL Server solution takes a different approach from the Oracle PL/SQL solution. In the latter, an Oracle PL/SQL procedure used built-in Oracle packages to open and manage an HTTP connection, to read data from the HTTP socket, and to write the data to the console. In practice, instead of writing the data to the console, we might have processed the data directly or loaded the data to a staging table for later processing. The SQL Server solution is different; rather than use T-SQL to manage the HTTP connection, this solution uses an external program to read the data from the web into a file on a Windows Server before bulk importing the data into a SQL Server staging table. After we load the data to a staging table, we can process the data as needed by our application.
Let's look at the SQL Server solution. First, we'll use the data from the Federal Reserve from the previous post again. Put the URL https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package into your browser. When your browser opens the URL, you'll get a CSV file like the following:
use MyDatabase go create table stageFedRates ( timePeriod date, jrxwtfb_nb dec(15,8), jrxwtfn_nb dec(15,8), jrxwtfo_nb dec(15,8) ) ; go
We have a couple of choices of programs to pull data from the web into an external file. One option is wget, another is curl. Curl is already installed on my Windows 10 computer and my Windows 2008 Server, so we'll use curl. The syntax is pretty simple, just type curl and the URL enclosed in double quotes. Don't forget the double quotes -- the URL is full of special characters that command console will try to interpret if not quoted. The output goes to STDOUT by default. Let's try it in a command shell window:
C:\> curl "https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package" "Series Description","Nominal Broad Dollar Index ","Nominal Major Currencies Dollar Index ","Nominal Other Important Trading Partners Dollar Index " "Unit:","Index:_1973_Mar_100","Index:_1973_Mar_100","Index:_1997_Jan_100" "Multiplier:","1","1","1" "Currency:","NA","NA","NA" "Unique Identifier: ","H10/H10/JRXWTFB_N.B","H10/H10/JRXWTFN_N.B","H10/H10/JRXWTFO_N.B" "Time Period","JRXWTFB_N.B","JRXWTFN_N.B","JRXWTFO_N.B" 2018-07-09,123.4946,89.6613,160.2890 2018-07-10,123.4637,89.7541,160.0923 2018-07-11,123.8276,89.8616,160.7776 2018-07-12,123.7232,90.1620,160.1318 2018-07-13,124.0710,90.2872,160.7554 2018-07-16,123.8317,90.0232,160.5674 2018-07-17,124.1553,90.4107,160.7806 2018-07-18,124.3573,90.5535,161.0480 2018-07-19,125.0530,90.9072,162.1562 2018-07-20,125.0484,90.9072,162.1456 c:\>
This is the same data we saw in the Excel spreadsheet above. Two useful options of the curl command are the --stderr option and the --output option. Those two options specify files to store STDERR and the output file:
C:\>curl --stderr curlErrors.txt --output curlData.csv "https://www.federalreserve.gov/datadownload/Output.aspx?rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package" C:\>dir curl*.* Volume in drive C has no label. Volume Serial Number is 1836-25CF Directory of C:\ 2018-07-28 03:38 PM 804 curlData.csv 2018-07-28 03:38 PM 399 curlErrors.txt 2 File(s) 1,203 bytes 0 Dir(s) 352,925,888,512 bytes free C:\> type curlErrors.txt % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 804 0 804 0 0 804 0 --:--:-- --:--:-- --:--:-- 2233
Now that we can pull data from our source, let's look at loading into a SQL Server table. From a new SSMS window, use the bulk import command to load the table:
use MyDatabase go Bulk Insert stageFedRates from 'C:\curlData.csv ' with ( fieldterminator = ',', rowTerminator = '\n', firstRow = 7, maxErrors = 999999999 )
The fieldTerminator and rowTerminator options specify that the input is a CSV with rows delimited by new line characters and the columns by commas. We know there is some metadata in the download, so we specify firstRow=7 to start loading data. Finally, maxErrors specifies how many errors the import will tolerate before aborting the load. This will depend on the data; if the data is very clean, specify a small number of errors, but if not, specify a large number so all the data loads into the table for post-load cleanup.
In a SSMS window, select all the data and see what we imported into the SS table:
To work around this, we will query SQL Server's TMP environmental variable, and we will use that directory to write to and read from. The following stored procedure will do the trick. The comments will explain how it works along way. It's worth leaving the PRINT statements in the code for debugging; getting all the quotes and double-quotes correct can be quite tricky.
use MyDatabase go create procedure [dbo].[loadFedReserve] AS begin -- Get SQL Server's tmp directory. declare @destinationDirectory varchar(256); declare @cmd varchar(256) = 'echo %TMP%'; -- Create a temporary table create table #tmpResult ( envVar varchar(256) ) -- Store the results of executing "echo %TMP%" on the Windows server -- into the temporary table insert into #tmpResult exec xp_cmdshell @cmd -- Now get the TMP directory from the table set @destinationDirectory = ( select top 1 envVar from #tmpResult); -- Build the command line, including the output and stderr file locations. -- We will reuse the @destination variable when we import the file to SQL Server. -- Be sure to use the double quotes around the URL declare @curlCommand varchar(512) = 'curl '; declare @destination varchar(128) = @destinationDirectory +'\curlData.csv '; declare @errors varchar(128) = @destinationDirectory + '\curlErrors.txt" '; set @curlCommand = @curlCommand + '--stderr "' + @errors set @curlCommand = @curlCommand + ' --output ' + @destination set @curlCommand = @curlCommand + '"https://www.federalreserve.gov/datadownload/Output.aspx?' + 'rel=H10&series=122e3bcb627e8e53f1bf72a1a09cfb81&lastObs=10' + '&from=&to=&filetype=csv&label=include&layout=seriescolumn&type=package"' -- Little bit of debugging, then execute the command print @curlCommand; exec xp_cmdshell @curlCommand -- The table is usually a staging table that we reuse, so delete any data in it. delete from stageFedRates; -- Now build a sql statement with the bulk import command and the file locations -- that we determened before declare @sqlStmt nvarchar(256) = 'Bulk Insert stageFedRates from ''' + @destination + ''' with ( fieldterminator = '','', rowTerminator = ''\n'', firstRow = 7,maxErrors = 999999999 )' -- A little more debugging, then execute the load. Building the SQL command and -- executing it with sp_executeSQL let's us change the imported file programmatically. print @sqlStmt; exec sp_executeSQL @sqlStmt select count(*) RowsImported from fedRates END GO
Depending on how your SQL Server instance is configured, you may need to enable the use of xp_cmdshell. I can execute the following on the instance of SQL Server on my PC; the help of a DBA is probably required for an enterprise server.
exec sp_configure 'show advanced options', 1; go -- Refresh the configuration reconfigure ; go -- Enable xp_cmdshell exec sp_configure 'xp_cmdshell', 1; go -- Refresh the configuration reconfigure ; go
After compiling the store procedure, open a new SSMS window, execute the stored procedure, and select the results of the load:
exec loadFedReserve select * from stageFedRates
There should be 10 rows, like below, and it will take SQL Server less than a second to pull the data and load the table:
No comments:
Post a Comment