Sunday, April 7, 2013

Gaps to islands

On an old post, I discussed how island and gaps algorithms work: http://www.anicehumble.com/2012/08/monitoring-perfect-attendance.html

On this post, I will discuss the exact reverse of that.

Given the following data:

```CREATE TABLE y
("id" int, "val" varchar(1))
;

INSERT INTO y
("id", "val")
VALUES
(1, 'a'),
(4, 'b')
;

CREATE TABLE x
("id" int)
;

INSERT INTO x
("id")
VALUES
(1),
(2),
(3),
(4),
(5)
;

with z as
(
select x.id, y.val
from x
left join y on y.id = x.id
)
select  *
from z
order by id ;

```

```| ID |    VAL |
---------------
|  1 |      a |
|  2 | (null) |
|  3 | (null) |
|  4 |      b |
|  5 | (null) |
```

Requirement is to cluster those gaps together:
```| ID | VAL |
------------
|  1 |   a |
|  2 |   a |
|  3 |   a |
|  4 |   b |
|  5 |   b |
```

The easiest way to solve that is to use windowing function, i.e. use first_value on each group. First thing first, we must devise a way to group those gaps together. We can do that by counting over the id's order. COUNT doesn't count nulls, hence COUNT will be able to group nulls to a previous non-null, COUNT will maintain the same count as long as it is encountering nulls. To illustrate:

```with z as
(
select
x.id, y.val,
count(y.val) over(order by x.id ) as grp
from x
left join y on y.id = x.id
)
select *
from z
order by id ;
```

Output:
```| ID |    VAL | GRP |
---------------------
|  1 |      a |   1 |
|  2 | (null) |   1 |
|  3 | (null) |   1 |
|  4 |      b |   2 |
|  5 | (null) |   2 |
```

Now that we designated a grouping number for related data, getting the first value among the group shall just be a simple undertaking, use first_value from the partition of grp

```with z as
(
select
x.id, y.val,
count(y.val) over(order by x.id ) as grp
from x
left join y on y.id = x.id
)
select
id, val, grp, first_value(val) over(partition by grp order by id)
from z;
```

Output:

```| ID |    VAL | GRP | FIRST_VALUE |
-----------------------------------
|  1 |      a |   1 |           a |
|  2 | (null) |   1 |           a |
|  3 | (null) |   1 |           a |
|  4 |      b |   2 |           b |
|  5 | (null) |   2 |           b |
```