Saturday, March 28, 2015

SQL Server Listagg, Part Two

Last week's post examined using common table expressions and recursive SQL to concatenate strings grouped by an id value in SQL Server.  This is an easy operation to perform in DB2 and Oracle; both databases support the LISTAGG function.   But SQL Server does not support the LISTAGG function, so we needed to take a different approach to getting the same results.

Our query works well, but sometimes our data may have values that repeat within an id.   Because our query controls the recursion with the values that we're concatenating, we might not get the result back that we wanted.

Let's start with last week's sample table, and add a row to it:

    insert into sample values(3,'G');

Run the last week's final query, and our results look this:

IdConcat_value
1A,B
2C,D
3E,F,G
3E,F,G
4H


This is probably not the result we hoped for.  So, first question:  what did we hope to see?   If the correct result is four rows, with the value of "E,F,G" for Id = 3 , then we just need to add the distinct function to the final query.

But maybe the correct answer is four rows, and the concat_value for Id = 3 is "E,F,G,G".  In that case, we need to do a little work to our query.   As originally written, the query uses the VALUE column to control the recursion.   If the values repeat, then we need to introduce a surrogate value to the query.  Let's examine the query and let the code comments do the talking:


-- Start with a new common table expression(CTE) that creates a unique surrogate
-- value for each value in the VALUE column.  The OLAP row_number() function will 
-- do that easily.
with add_surogate (id, value, surogate_value) as (
    select  id, 
     value,       
            row_number() over (partition by id order by value)     
    from sample
    )

-- Add the surrogate_value to the rq CTE
,rq ( id, value, surogate_value,  concat_value, depth) as (

   select a.id,          
          a.value, 
   a.surogate_value,
          cast( a.value as varchar(256)), 
          1
   from add_surogate a
   where a.surogate_value = (
         select min(surogate_value)   -- Start the recursion using the surrogate value
         from add_surogate b
         where a.id = b.id
         )

   union all 

   select c.id,     
          c.value, 
          c.surogate_value, 
          -- No changes, concatenate values as before 
          cast( rq.concat_value + ',' + c.value as varchar(256)), 
          rq.depth + 1
   from add_surogate c
   inner join rq 
      on c.id = rq.id
         -- And control the progress of the recursion using the
         -- surrogate value
        and c.surogate_value > rq.surogate_value
        and rq.depth < 10
)

,all_rows as (
    select id, value, concat_value, depth,
           max(depth) over (partition by id) max_depth
    from rq
)

select id, concat_value
from all_rows
where depth = max_depth


The changes from last week's query to this week's query are pretty simple:
  • Add a new common table expression to create a surrogate value
  • Add the surrogate value to the recursive common table expression
  • Use the surrogate value to control the recursion
  • Continue to build the result string as before
With these simple changes, we get the desired results:

ID Concat_value
1 A,B
2 C,D
3 E,F,G,G
4 H


More Reading:

The vendors' online documentation have good examples:

No comments:

Post a Comment