Thursday, May 3, 2012

SQL COUNT: A Computer Program Does What You Tell It To Do, Not What You Want It To Do

Given this data:

test=# select * from ios_app order by app_code, date_released;
.ios_app_id | app_code | date_released | price  
------------+----------+---------------+--------
          1 | AB       | 2010-01-01    | 1.0000
          3 | AB       | 2010-01-03    | 3.0000
          4 | AB       | 2010-01-04    | 4.0000
          2 | TR       | 2010-01-02    | 2.0000
          5 | TR       | 2010-01-05    | 5.0000
(5 rows)

And this data:
test=# select * from android_app order by app_code, date_released;
.android_app_id | app_code | date_released |  price  
----------------+----------+---------------+---------
              1 | AB       | 2010-01-06    |  6.0000
              2 | AB       | 2010-01-07    |  7.0000
              7 | MK       | 2010-01-07    |  7.0000
              3 | TR       | 2010-01-08    |  8.0000
              4 | TR       | 2010-01-09    |  9.0000
              5 | TR       | 2010-01-10    | 10.0000
              6 | TR       | 2010-01-11    | 11.0000
(7 rows)

I made this query...

select x.app_code, 
 count(i.date_released) as ios_release_count, 
 count(a.date_released) as android_release_count
from app x
left join ios_app i on i.app_code = x.app_code
left join android_app a on a.app_code = x.app_code
group by x.app_code
order by x.app_code


...and I wanted that to display this...

.app_code | ios_release_count | android_release_count 
----------+-------------------+-----------------------
 AB       |                 3 |                     2
 MK       |                 0 |                     1
 PM       |                 0 |                     0
 TR       |                 2 |                     4
(4 rows)


...but why it displays this?

.app_code | ios_release_count | android_release_count 
----------+-------------------+-----------------------
 AB       |                 6 |                     6
 MK       |                 0 |                     1
 PM       |                 0 |                     0
 TR       |                 8 |                     8
(4 rows)


Why ios_release_count says 6 when it should be 3 instead?

The same with TR, why its ios_release_count displays 8 when it should be 2 instead?

Do you notice something? The incorrect result 6 on your query is a multiplication of ios_release_count of 3 and android_release_count of 2. Likewise with result 8 on your query, it is a multiplication of ios_release_count of 2 and android_release_count of 4.


To visualize, try to remove the grouping and count aggregation on your query...

select x.app_code, i.date_released as ios_release_date, a.date_released as android_release_date
from app x
left join ios_app i on i.app_code = x.app_code
left join android_app a on a.app_code = x.app_code
order by x.app_code, ios_release_date, android_release_date;


This will be the results of your query:


.app_code | ios_release_date | android_release_date 
----------+------------------+----------------------
 AB       | 2010-01-01       | 2010-01-06
 AB       | 2010-01-01       | 2010-01-07
 AB       | 2010-01-03       | 2010-01-06
 AB       | 2010-01-03       | 2010-01-07
 AB       | 2010-01-04       | 2010-01-06
 AB       | 2010-01-04       | 2010-01-07
 MK       |                  | 2010-01-07
 PM       |                  | 
 TR       | 2010-01-02       | 2010-01-08
 TR       | 2010-01-02       | 2010-01-09
 TR       | 2010-01-02       | 2010-01-10
 TR       | 2010-01-02       | 2010-01-11
 TR       | 2010-01-05       | 2010-01-08
 TR       | 2010-01-05       | 2010-01-09
 TR       | 2010-01-05       | 2010-01-10
 TR       | 2010-01-05       | 2010-01-11
(16 rows)


You notice something? the ios_release_date keeps on repeating. The reason why? For every AB row, it get paired with two rows on android. How many rows are there in AB? Three, right? So when you multiply three by two, you get 6, so that's where the count of 6 comes from! Likewise for every TR row in ios, there's four rows on android; there are two TR rows in ios. You multiply two by four, you get 8!


So it's time to correct the query.

With the advent of CTE, flattening the results is way neater and you can give them their own name; afterwards, you can join them to the master table:

with ios_app_release_count_list as
(
 select app_code, count(date_released) as ios_release_count
 from ios_app
 group by app_code
)
,android_release_count_list as
(
 select app_code, count(date_released) as android_release_count 
 from android_app 
 group by app_code  
)
select 
 x.app_code, 
 coalesce(i.ios_release_count,0) as ios_release_count, 
 coalesce(a.android_release_count,0) as android_release_count
from app x
left join ios_app_release_count_list i on i.app_code = x.app_code
left join android_release_count_list a on a.app_code = x.app_code
order by x.app_code;

Here's the output for that:

.app_code | ios_release_count | android_release_count 
----------+-------------------+-----------------------
 AB       |                 3 |                     2
 MK       |                 0 |                     1
 PM       |                 0 |                     0
 TR       |                 2 |                     4
(4 rows)

It's already correct, isn't it?


CTE lets you divide-and-conquer the problem pretty well, make a query of something, make a good name for it, then join it to master table.

So with absence of CTE, this is how your query shall be written like:

select x.app_code, 
 coalesce(i.ios_release_count,0) as ios_release_count, 
 coalesce(a.android_release_count,0) as android_release_count
from app x
left join
(
 select app_code, count(date_released) as ios_release_count
 from ios_app
 group by app_code
) i on i.app_code = x.app_code
left join
(
 select app_code, count(date_released) as android_release_count 
 from android_app 
 group by app_code   
) a on a.app_code = x.app_code
order by x.app_code

It's not as intuitive and neat as the CTE approach. Nevertheless, it works!





No comments:

Post a Comment