Tuesday, May 1, 2012

Find the path between source and target using recursive CTE

Given this sample data:

CREATE TABLE tbl
 (source int, target int, strength int);

INSERT INTO tbl
 (source, target, strength)
VALUES
 (1, 2, 1),
 (1, 3, 1),
 (2, 4, 1),
 (2, 5, 1),
 (3, 6, 1),
 (3, 7, 1),
 (4, 8, 1),
 (4, 9, 1),
 (5, 10, 1),
 (5, 11, 1);


What's the best query to find the path between source = 2 and target = 9?

The output would be this:
SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9



If you have to start with source(2), that would entail traversing all its child nodes, only to discard some of them later on, this is inefficient.

With this in mind, we shall construct our query to start at the target then work its way up to find the source.


We shall divide-and-conquer the problem, first, let's do the traversal from bottom to top:

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9
  
    union all
  
    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
)
select * from find_parent

Output:
SOURCE TARGET RECENTNESS
4      9      0
2      4      1
1      2      2

Live test for query above: http://www.sqlfiddle.com/#!1/73210/37


The query is working, but it has no terminating condition to stop searching its way back to the source yet.


We just need to add a condition on join clause to facilitate this scenario:
with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9
  
    union all
  
    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
select * from find_parent


Output:
SOURCE TARGET RECENTNESS
4      9      0
2      4      1


Live test for query above: http://www.sqlfiddle.com/#!1/73210/38


Final step, construct the path:

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9
  
    union all
  
    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)
    
    union
    
    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc


SOURCE  TARGET   PATH
2       4        2.4
4       9        2.4.9

Live test for query above: http://www.sqlfiddle.com/#!1/73210/40


Basically that's it. First, we traverse the path from child to parent using find_parent, then we construct user-readable path using construct_path


So what will happen, if we didn't remove the and i.target<> on join condition in find_parent?

This will be the output:

SOURCE TARGET PATH
1      2      1.2
2      4      1.2.4
4      9      1.2.4.9


Live test: http://www.sqlfiddle.com/#!1/73210/42


Solution for this problem: https://stackoverflow.com/questions/10392567/postgresql-pass-data-from-recursive-cte-onto-function

No comments:

Post a Comment