Lost Learnings: 2015

Wednesday, April 15, 2015

Maple Syrup - Liquid Gold

Today, I became a New Englander.

I could not claim that before. I am, as the Maine expression says, "from away", a transplanted Marylander rooting for the Baltimore Orioles in the spring and the New England Patriots in the fall. But today, I made my first batch of maple syrup. Starting with two quarts of fresh maple sap from our friends' sugar maple trees and some advice on sugar content, I boiled two quarts of sap down to about two ounces of maple syrup. I feel like I belong.

Maple syruping has a long tradition pre-dating European settlers. It is a pretty simple process that has become more mechanized over the ages, with long lines of tubing replacing traditional taps and buckets and horses and sleds. Anyone living in New England owes themselves a visit to one of the many sugar houses dotting the New England countryside to watch the pros in action. In this post, I'll describe a simple way of making a few ounces of home-made maple syrup.

Four quart pot boiling two quarts of maple sap

Four quart pot to boil two quarts of sap.

I started with two quarts of maple sap in a four-quart pan. Leave plenty of room to prevent the sap from boiling over into the stove.

Piece of paper showing my reduction calculations

Calculating reduction

The syrup is usually boiled to a 30:1 to 40:1 reduction. Our friends and a nearby sugar house both said the sap was very heavy with sugar this year, so I guessed that a 30:1 reduction would work. Professionals use the syrup's boiling point to indicate the syrup is ready. I started with 60mm of syrup in the four-quart pan. A 30:1 reduction leaves should leave about 2mm of syrup in the pan.

Maple sap boiling in a smaller container

Boiling down to a depth of 2mm didn't seem practical, so at 10mm in the four-quart pan I transferred the sap to a smaller container. I had boiled to a 6:1 reduction, now I just needed a 5:1 reduction in the smaller pan. There was 49mm of syrup in the small pot, so a 5:1 reduction should give me about 10mm of syrup.

Small maple syrup bottle with home-made funnel

Maple syrup jug with tin foil funnel

After boiling down to the 10mm mark, it's time to bottle. I have a small syrup container complete with a hand-made aluminum foil funnel and bowl to catch any stray syrup.

Liquid Gold!

Finally, the finished product: a little over two ounces of maple syrup.

Today, I am a New Englander. Tomorrow, I eat pancakes and maple syrup!

Saturday, March 28, 2015

SQL Server Listagg, Part Two

Last week's post examined using common table expressions and recursive SQL to concatenate strings grouped by an id value in SQL Server. This is an easy operation to perform in DB2 and Oracle; both databases support the LISTAGG function. But SQL Server does not support the LISTAGG function, so we needed to take a different approach to getting the same results.

Our query works well, but sometimes our data may have values that repeat within an id. Because our query controls the recursion with the values that we're concatenating, we might not get the result back that we wanted.

Let's start with last week's sample table, and add a row to it:

    insert into sample values(3,'G');

Run the last week's final query, and our results look this:

Id	Concat_value
1	A,B
2	C,D
3	E,F,G
3	E,F,G
4	H

This is probably not the result we hoped for. So, first question: what did we hope to see? If the correct result is four rows, with the value of "E,F,G" for Id = 3 , then we just need to add the distinct function to the final query.

But maybe the correct answer is four rows, and the concat_value for Id = 3 is "E,F,G,G". In that case, we need to do a little work to our query. As originally written, the query uses the VALUE column to control the recursion. If the values repeat, then we need to introduce a surrogate value to the query. Let's examine the query and let the code comments do the talking:

-- Start with a new common table expression(CTE) that creates a unique surrogate
-- value for each value in the VALUE column.  The OLAP row_number() function will 
-- do that easily.
with add_surogate (id, value, surogate_value) as (
    select  id, 
     value,       
            row_number() over (partition by id order by value)     
    from sample
    )

-- Add the surrogate_value to the rq CTE
,rq ( id, value, surogate_value,  concat_value, depth) as (

   select a.id,          
          a.value, 
   a.surogate_value,
          cast( a.value as varchar(256)), 
          1
   from add_surogate a
   where a.surogate_value = (
         select min(surogate_value)   -- Start the recursion using the surrogate value
         from add_surogate b
         where a.id = b.id
         )

   union all 

   select c.id,     
          c.value, 
          c.surogate_value, 
          -- No changes, concatenate values as before 
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from add_surogate c
   inner join rq 
      on c.id = rq.id
         -- And control the progress of the recursion using the
         -- surrogate value
        and c.surogate_value > rq.surogate_value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
           max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth

The changes from last week's query to this week's query are pretty simple:

Add a new common table expression to create a surrogate value
Add the surrogate value to the recursive common table expression
Use the surrogate value to control the recursion
Continue to build the result string as before

With these simple changes, we get the desired results:

ID	Concat_value
1	A,B
2	C,D
3	E,F,G,G
4	H

More Reading:

The vendors' online documentation have good examples:

SQL Server 2008 Recursive Queries Using Common Table Expressions
DB2 Info Center
Oracle 11gR2 Don't be put off by the term "subquery factoring". It's just recursion.

Friday, March 20, 2015

SQL Server ListAgg Function, Part One

Lately, I've been straddling the Oracle database world and the Microsoft SQL Server world, moving data from Oracle tables to SQL Server tables. One of the things I needed to do was concatenate string values grouped by an id value.

This is a pretty easy thing to do in the Oracle world. Let's create a table and add some data:

create table sample (
    id number,
    value varchar2(16)
);
insert into sample values (1, 'A');
insert into sample values (1, 'B');
insert into sample values (2, 'C');
insert into sample values (2, 'D');
insert into sample values (3, 'E');
insert into sample values (3, 'F');
insert into sample values (3, 'G');
insert into sample values (4, 'H');
commit;

Next, using the LISTAGG function available in DB2 and Oracle, our query will return an id and the concatenation of the values for each id:

select id, 
       listagg(value,',') within group (order by value) concat_value
from sample
group by id

ID	CONCAT_VALUE
1	A,B
2	C,D
3	E,F,G
4	H

That was easy on an Oracle database! Now, search the SQL Server documentation, and you won't find a LISTAGG function. Next, google "SQL Server listagg", and you'll find lots of discussion. Stackoverflow contributors offer many ways of solving this problem, usually using stored procedures or functions. In this post, I will suggest a SQL solution to the problem using SQL recursion and common table expressions.

First, grab the create table and the insert SQL above. Change the varchar2 datatype to varchar (or nvarchar) and create the sample table on your SQL Server database. Next, construct the recursive query.

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

select rq.* 
from rq
order by id, depth

Before we execute the query, let's examine it:

the recursion returns 4 columns: the id, the current value, the result of concatenating the values, and the recursive depth
we CAST the concatenated value to a varchar large enough to hold the result
we use a correlated query to return the smallest value for each id. This gives us a starting point for the recursion.
Tracking the depth is useful while developing a query. The test for rq.depth < 10 will keep the query from spinning away if there's an error.
The condition c.value > rq.value performs the same function as "order by value" in the Oracle query.

Execute the query, and we get the following result table:

ID	Value	Concat_value	depth
1	A	A	1
1	B	A,B	2
2	C	C	1
2	D	C,D	2
3	E	E	1
3	F	E,F	2
3	G	E,G	2
3	G	E,F,G	3
4	H	H	1

The answer we want is in the result table above, we just need to filter out the intermediate rows. The rows we want have have the maximum depth value by ID. We will use the max function by id to find the maximum depth in a second common table expression, and then filter the results in our final query:

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
  max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth

When we execute this query, we get the same results as our original query using the LISTAGG function in Oracle:

ID	Concat_value
1	A,B
2	C,D
3	E,F,G
4	H

Using common table expressions and recursion, we now have a way to aggregate strings together in SQL Server. But this is just the start; using the method for string aggregation as a model, we can develop other aggregations as well. Next week, we talk about handling cases where values are repeated within an id.

Lost Learnings