Friday, March 20, 2015

SQL Server ListAgg Function, Part One

Lately, I've been straddling the Oracle database world and the Microsoft SQL Server world, moving data from Oracle tables to SQL Server tables.  One of the things I needed to do was concatenate string values grouped by an id value.

This is a pretty easy thing to do in the Oracle world.   Let's create a table and add some data:

create table sample (
    id number,
    value varchar2(16)
);
insert into sample values (1, 'A');
insert into sample values (1, 'B');
insert into sample values (2, 'C');
insert into sample values (2, 'D');
insert into sample values (3, 'E');
insert into sample values (3, 'F');
insert into sample values (3, 'G');
insert into sample values (4, 'H');
commit; 

Next, using the LISTAGG function available in DB2 and Oracle, our query will return an id and the concatenation of the values for each id:

select id, 
       listagg(value,',') within group (order by value) concat_value
from sample
group by id

ID CONCAT_VALUE
1A,B
2C,D
3E,F,G
4H

That was easy on an Oracle database!  Now, search the SQL Server documentation, and you won't find a LISTAGG function. Next, google "SQL Server listagg", and you'll find lots of discussion. Stackoverflow contributors offer many ways of solving this problem, usually using stored procedures or functions. In this post, I will suggest a SQL solution to the problem using SQL recursion and common table expressions.

First, grab the create table and the insert SQL above.  Change the varchar2 datatype to varchar (or nvarchar) and create the sample table on your SQL Server database.  Next, construct the recursive query.

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

select rq.* 
from rq
order by id, depth

Before we execute the query, let's examine it:
  • the recursion returns 4 columns:  the id, the current value, the result of concatenating the values, and the recursive depth
  • we CAST the concatenated value to a varchar large enough to hold the result
  • we use a correlated query to return the smallest value for each id.  This gives us a starting point for the recursion. 
  • Tracking the depth is useful while developing a query.  The test for rq.depth < 10 will keep the query from spinning away if there's an error.
  • The condition c.value > rq.value performs the same function as "order by value" in the Oracle query.   
Execute the query, and we get the following result table:

IDValueConcat_valuedepth
1AA 1
1BA,B 2
2CC 1
2DC,D 2
3EE 1
3FE,F 2
3GE,G 2
3GE,F,G 3
4HH 1

The answer we want is in the result table above, we just need to filter out the intermediate rows.   The rows we want have have the maximum depth value by ID.  We will use the max function by id to find the maximum depth in a second common table expression, and then filter the results in our final query:

with rq ( id, value, concat_value, depth) as (

   select a.id, 
          a.value, 
          cast( a.value as varchar(256)), 
          1
   from sample a
   where a.value = (
         select min(value) 
         from sample b
         where a.id = b.id
         )

   union all 

   select c.id, 
          c.value,  
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from sample c
   inner join rq 
      on c.id = rq.id
        and c.value > rq.value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
  max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth


When we execute this query, we get the same results as our original query using the LISTAGG function in Oracle:

ID Concat_value
1 A,B
2 C,D
3 E,F,G
4 H

Using common table expressions and recursion, we now have a way to aggregate strings together in SQL Server.  But this is just the start; using the method for string aggregation as a model, we can develop other aggregations as well.  Next week, we talk about handling cases where values are repeated within an id.



No comments:

Post a Comment