Saturday, March 28, 2015

SQL Server Listagg, Part Two

Last week's post examined using common table expressions and recursive SQL to concatenate strings grouped by an id value in SQL Server.  This is an easy operation to perform in DB2 and Oracle; both databases support the LISTAGG function.   But SQL Server does not support the LISTAGG function, so we needed to take a different approach to getting the same results.

Our query works well, but sometimes our data may have values that repeat within an id.   Because our query controls the recursion with the values that we're concatenating, we might not get the result back that we wanted.

Let's start with last week's sample table, and add a row to it:

    insert into sample values(3,'G');

Run the last week's final query, and our results look this:

IdConcat_value
1A,B
2C,D
3E,F,G
3E,F,G
4H


This is probably not the result we hoped for.  So, first question:  what did we hope to see?   If the correct result is four rows, with the value of "E,F,G" for Id = 3 , then we just need to add the distinct function to the final query.

But maybe the correct answer is four rows, and the concat_value for Id = 3 is "E,F,G,G".  In that case, we need to do a little work to our query.   As originally written, the query uses the VALUE column to control the recursion.   If the values repeat, then we need to introduce a surrogate value to the query.  Let's examine the query and let the code comments do the talking:


-- Start with a new common table expression(CTE) that creates a unique surrogate
-- value for each value in the VALUE column.  The OLAP row_number() function will 
-- do that easily.
with add_surogate (id, value, surogate_value) as (
    select  id, 
     value,       
            row_number() over (partition by id order by value)     
    from sample
    )

-- Add the surrogate_value to the rq CTE
,rq ( id, value, surogate_value,  concat_value, depth) as (

   select a.id,          
          a.value, 
   a.surogate_value,
          cast( a.value as varchar(256)), 
          1
   from add_surogate a
   where a.surogate_value = (
         select min(surogate_value)   -- Start the recursion using the surrogate value
         from add_surogate b
         where a.id = b.id
         )

   union all 

   select c.id,     
          c.value, 
          c.surogate_value, 
          -- No changes, concatenate values as before 
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from add_surogate c
   inner join rq 
      on c.id = rq.id
         -- And control the progress of the recursion using the
         -- surrogate value
        and c.surogate_value > rq.surogate_value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
           max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth


The changes from last week's query to this week's query are pretty simple:
  • Add a new common table expression to create a surrogate value
  • Add the surrogate value to the recursive common table expression
  • Use the surrogate value to control the recursion
  • Continue to build the result string as before
With these simple changes, we get the desired results:

ID Concat_value
1 A,B
2 C,D
3 E,F,G,G
4 H


More Reading:

The vendors' online documentation have good examples:

Friday, March 20, 2015

SQL Server ListAgg Function, Part One

Lately, I've been straddling the Oracle database world and the Microsoft SQL Server world, moving data from Oracle tables to SQL Server tables.  One of the things I needed to do was concatenate string values grouped by an id value.

This is a pretty easy thing to do in the Oracle world.   Let's create a table and add some data:

create table sample (
    id number,
    value varchar2(16)
);
insert into sample values (1, 'A');
insert into sample values (1, 'B');
insert into sample values (2, 'C');
insert into sample values (2, 'D');
insert into sample values (3, 'E');
insert into sample values (3, 'F');
insert into sample values (3, 'G');
insert into sample values (4, 'H');
commit; 

Next, using the LISTAGG function available in DB2 and Oracle, our query will return an id and the concatenation of the values for each id:

select id, 
       listagg(value,',') within group (order by value) concat_value
from sample
group by id

ID CONCAT_VALUE
1A,B
2C,D
3E,F,G
4H

That was easy on an Oracle database!  Now, search the SQL Server documentation, and you won't find a LISTAGG function. Next, google "SQL Server listagg", and you'll find lots of discussion. Stackoverflow contributors offer many ways of solving this problem, usually using stored procedures or functions. In this post, I will suggest a SQL solution to the problem using SQL recursion and common table expressions.

First, grab the create table and the insert SQL above.  Change the varchar2 datatype to varchar (or nvarchar) and create the sample table on your SQL Server database.  Next, construct the recursive query.

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

select rq.* 
from rq
order by id, depth

Before we execute the query, let's examine it:
  • the recursion returns 4 columns:  the id, the current value, the result of concatenating the values, and the recursive depth
  • we CAST the concatenated value to a varchar large enough to hold the result
  • we use a correlated query to return the smallest value for each id.  This gives us a starting point for the recursion. 
  • Tracking the depth is useful while developing a query.  The test for rq.depth < 10 will keep the query from spinning away if there's an error.
  • The condition c.value > rq.value performs the same function as "order by value" in the Oracle query.   
Execute the query, and we get the following result table:

IDValueConcat_valuedepth
1AA 1
1BA,B 2
2CC 1
2DC,D 2
3EE 1
3FE,F 2
3GE,G 2
3GE,F,G 3
4HH 1

The answer we want is in the result table above, we just need to filter out the intermediate rows.   The rows we want have have the maximum depth value by ID.  We will use the max function by id to find the maximum depth in a second common table expression, and then filter the results in our final query:

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
  max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth


When we execute this query, we get the same results as our original query using the LISTAGG function in Oracle:

ID Concat_value
1 A,B
2 C,D
3 E,F,G
4 H

Using common table expressions and recursion, we now have a way to aggregate strings together in SQL Server.  But this is just the start; using the method for string aggregation as a model, we can develop other aggregations as well.  Next week, we talk about handling cases where values are repeated within an id.