Sunday, April 7, 2013

Gaps to islands

On an old post, I discussed how island and gaps algorithms work: http://www.anicehumble.com/2012/08/monitoring-perfect-attendance.html

On this post, I will discuss the exact reverse of that.

Given the following data:


CREATE TABLE y
 ("id" int, "val" varchar(1))
;
 
INSERT INTO y
 ("id", "val")
VALUES
 (1, 'a'),
 (4, 'b')
;

CREATE TABLE x
 ("id" int)
;
 
INSERT INTO x
 ("id")
VALUES
 (1),
 (2),
 (3),
 (4),
 (5)
;

with z as
(
select x.id, y.val
from x
left join y on y.id = x.id
)
select  *
from z
order by id ;


| ID |    VAL |
---------------
|  1 |      a |
|  2 | (null) |
|  3 | (null) |
|  4 |      b |
|  5 | (null) |  


Requirement is to cluster those gaps together:
| ID | VAL |
------------
|  1 |   a |
|  2 |   a |
|  3 |   a |
|  4 |   b |
|  5 |   b |


The easiest way to solve that is to use windowing function, i.e. use first_value on each group. First thing first, we must devise a way to group those gaps together. We can do that by counting over the id's order. COUNT doesn't count nulls, hence COUNT will be able to group nulls to a previous non-null, COUNT will maintain the same count as long as it is encountering nulls. To illustrate:


with z as
(
  select       
      x.id, y.val,
     count(y.val) over(order by x.id ) as grp
  from x
  left join y on y.id = x.id
)
select *
from z
order by id ;


Output:
| ID |    VAL | GRP |
---------------------
|  1 |      a |   1 |
|  2 | (null) |   1 |
|  3 | (null) |   1 |
|  4 |      b |   2 |
|  5 | (null) |   2 |  


Now that we designated a grouping number for related data, getting the first value among the group shall just be a simple undertaking, use first_value from the partition of grp

with z as
(
  select       
      x.id, y.val,
     count(y.val) over(order by x.id ) as grp
  from x
  left join y on y.id = x.id
)
select 
  id, val, grp, first_value(val) over(partition by grp order by id)
from z;


Output:

| ID |    VAL | GRP | FIRST_VALUE |
-----------------------------------
|  1 |      a |   1 |           a |
|  2 | (null) |   1 |           a |
|  3 | (null) |   1 |           a |
|  4 |      b |   2 |           b |
|  5 | (null) |   2 |           b |

No comments:

Post a Comment