Thursday, May 9, 2019

In For The Win!

Someone on stackoverflow observed that IN is faster than EXISTS on MySQL

select * from `replays` 
where id in (
    select replay_id from `players` 
    where `battletag_name` = 'test') 
order by `id` asc 
limit 100;

select * from `replays` 
where exists (
    select * from `players` 
    where `replays`.`id` = `players`.`replay_id` 
      and `battletag_name` = 'test') 
order by `id` asc 
limit 100;

EXISTS took 70 seconds. IN took 0.4 second.


Aside from IN being faster than EXISTS in most cases, readability of IN is a big win. And also with IN, queries can be made modular, e.g.,


with test_players as 
(
    select replay_id 
    from players 
    where battletag_name = 'test'
) 
select * 
from replays 
where id in (select replay_id from test_players)

Cannot do the above when using EXISTS.



Monday, April 29, 2019

Query Error: Error: ER_NOT_SUPPORTED_YET: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

Does not work on MySQL, works on Postgres

select
    t.player, 
    sum(case when t.eventid = 1 then t.points end) as event1,
    sum(case when t.eventid = 2 then t.points end) as event2,
    sum(case when t.eventid = 3 then t.points end) as event3,
    sum(case when t.eventid = 4 then t.points end) as event4,
    
    sum(
        case when t.points >= any(
            select best3.points 
            from tbl best3 
            where best3.player = t.player
            order by best3.points desc 
            limit 3
        ) then 
            t.points
        end
    )             
from tbl t
group by t.player

Output on MySQL 8.0:
Query Error: Error: ER_NOT_SUPPORTED_YET: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'


Output on Postgres:
| player | event1 | event2 | event3 | event4 | sum |
| ------ | ------ | ------ | ------ | ------ | --- |
| 1      | 25     | 15     | 20     | 20     | 65  |
| 2      | 20     | 13     | 12     | 10     | 45  |


You can even made the code neater on Postgres by using FILTER:

Live test: https://www.db-fiddle.com/f/haMmw4S4f7XMqcBD8CDV7H/1


select
    t.player, 
    sum(t.points) filter(where t.eventid = 1) as event1,
    sum(t.points) filter(where t.eventid = 2) as event2,
    sum(t.points) filter(where t.eventid = 3) as event3,
    sum(t.points) filter(where t.eventid = 4) as event4,
    
    sum(t.points) filter(where
        t.points >= any(
            select best3.points 
            from tbl best3 
            where best3.player = t.player
            order by best3.points desc 
            limit 3
        )
    ) as best3          
from tbl t
group by t.player          

Solution on MySQL 8.0, works on Postgres too:

Live test: https://www.db-fiddle.com/f/4ufuFAXKf7mi5yefNQqoXM/2

with ranking as
(
    select 
        player,       
        rank() over(partition by player order by points desc) as xrank,
        points
    from tbl
)
,pick3 as
(
    select 
        player, 
        sum(points) as best3
    from ranking 
    where xrank <= 3
    group by player
)
select
    t.player, 
    sum(if(t.eventid = 1, t.points,0)) as event1,
    sum(if(t.eventid = 2, t.points,0)) as event2,
    sum(if(t.eventid = 3, t.points,0)) as event3,  
    sum(if(t.eventid = 4, t.points,0)) as event4,
    p.best3            
from tbl t
join pick3 p on t.player = p.player
group by t.player

MySQL 5.7 solution:

Live test: https://www.db-fiddle.com/f/4ufuFAXKf7mi5yefNQqoXM/15

select
    t.player, 
    sum(case when t.eventid = 1 then t.points end) as event1,
    sum(case when t.eventid = 2 then t.points end) as event2,
    sum(case when t.eventid = 3 then t.points end) as event3,
    sum(case when t.eventid = 4 then t.points end) as event4,
    
    sum(
        case when t.points >= (
            select best3.points 
            from tbl best3 
            where best3.player = t.player
            order by best3.points desc 
            limit 1 offset 2
        ) then 
            t.points
        end
    ) as best3            
from tbl t
group by t.player;

Output:
| player | event1 | event2 | event3 | event4 | best3 |
| ------ | ------ | ------ | ------ | ------ | ----- |
| 1      | 25     | 15     | 20     | 20     | 65    |
| 2      | 20     | 13     | 12     | 10     | 45    |

Friday, April 19, 2019

salary > avg(salary) conundrum

Beginners, and I think even some non-beginners would want this to work:

select *
from employees 
where salary > AVG(salary)

The problem is that avg is applied to the filtered rows. It will be circular how avg(salary) will get its value if avg is part of where clause. So to prevent that confusion, SQL standard disallows using aggregate functions in where clause.

Humans are lazy. Programmers are humans. Programmers are lazy.

I think that it is a missed opportunity that SQL didn't impose aliases when tables are referenced, otherwise they could introduce functionality that would have no ambiguity:

select e.*
from employees e
where e.salary > AVG(employees.salary)

Or perhaps a little OOP:
select e.*
from employees e
where e.salary > employees.AVG(salary)


That would mean, get all the employee (denoted by e) from employees whose salary is greater than the average salary of all employees.

Before you scoff that it would be super-duper hard for the RDBMS developers to parse things like that. Consider that, that sort of brevity can be achieved in C#'s Linq:

Live test: https://dotnetfiddle.net/uXJEZF

using System;
using System.Linq;

class Employee 
{
    public string Name { get; set; }
    public decimal Salary { get; set; }
}

public class Simple 
{
    public static void Main () 
    {
        var employees = new Employee[]
        {
            new Employee { Name = "John", Salary = 10 },
            new Employee { Name = "Paul", Salary = 9 },
            new Employee { Name = "George", Salary = 2 },
            new Employee { Name = "Ringo", Salary = 1 },
        };

        Console.WriteLine ("Average salary: {0}", employees.Average (x => x.Salary));

        var query = 
            from e in employees
            where e.Salary > employees.Average(x => x.Salary)
            select e;

        foreach (var e in query)
        {
            Console.WriteLine ("{0} {1}", e.Name, e.Salary);
        }
    }
}

Output:
Average salary: 5.5
John 10
Paul 9


If RDBMS have OOP and Linq syntax, it can prevent unusual request:



Live test: https://dotnetfiddle.net/PiANMM

Thursday, April 18, 2019

Dynamic unpivoting: SQL Server said, Postgres said

SQL Server version:

declare @tx table(
    id int identity(1, 1) not null,
    data varchar(100),
    column1 int,
    column2 int,
    column3 int
);

insert into
    @tx(data, column1, column2, column3)
values
    ('data1', 1, 2, 3),
    ('data2', 4, 5, 6),
    ('data3', 7, 8, null);

select
    a.id,
    a.data,
    c.item,
    c.value 
from
    @tx a
    cross apply (
        values
            (
                cast(
                    (
                        select
                            a.* for xml raw
                    ) as xml
                )
            )
    ) as b(xmldata)
    cross apply (
        select
            item  = xattr.value('local-name(.)', 'varchar(100)'),
            value = xattr.value('.', 'int')
        from
            b.xmldata.nodes('//@*') as xnode(xattr)
        where
            xnode.xattr.value('local-name(.)', 'varchar(100)') not in 
                ('id', 'data', 'other-columns', 'to-exclude')
    ) c



SQL Server can't include null values though.

Postgres version:

Note: Should run this first before being able to use hstore functionality: create extension hstore

create temporary table tx(
    id int generated by default as identity primary key,
    data text,
    column1 int,
    column2 int,
    column3 int
) on commit drop;

insert into
    tx(data, column1, column2, column3)
values
    ('data1', 1, 2, 3),
    ('data2', 4, 5, 6),
    ('data3', 7, 8, null);
    
with a as (
    select
        id,
        data,
        each(hstore(tx.*) - 'id'::text - 'data'::text) as h
    from
        tx
)
select
    id,
    data,
    (h).key as item,
    (h).value::int as value
from
    a 
-- this would work too:
-- where (h).key not in ('id', 'data', 'other-columns', 'to-exclude')



No problem with Postgres, it include nulls


If the number of columns to unpivot is not so dynamic, can do this in Postgres:

select tx.id, tx.data, x.*
from tx
join lateral (
 values
     ('column1', column1), 
     ('column2', column2), 
     ('column3', column3)
) as x(item, value) on true

Equivalent to SQL Server:
select tx.id, tx.data, x.*
from @tx tx
cross apply (  
    values
     ('column1', column1), 
     ('column2', column2), 
     ('column3', column3)    
) as x(item, value);

Both Postgres and SQL Server include nulls in result

https://stackoverflow.com/questions/55731155/combined-semi-transpose-of-a-data/55731461

Saturday, April 13, 2019

pgAdmin Internal Server Error

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

If you encountered this error on *nix-based system, just delete the .pgAdmin directory from your directory, i.e.,

$ rm -rf ~/.pgadmin


Note that you will need to re-enter your user postgres password when pgAdmin is launched

Friday, April 12, 2019

SQL Intersect

create table test (
  id integer,
  pid integer,
  name varchar(2),
  val integer
);

insert into test
  (id, pid, name, val)
values
  ('1', '1', 'aa', '10'),
  ('2', '1', 'bb', '20'),
  ('3', '1', 'cc', '30'),
  ('4', '2', 'aa', '10'),
  ('5', '2', 'bb', '20'),
  ('6', '2', 'cc', '30'),
  ('7', '3', 'aa', '10'),
  ('8', '3', 'bb', '20'),
  ('9', '3', 'cc', '999');

select distinct pid from test 
where
pid in (select pid from test where (name,val) = ('aa',10))     
and pid in (select pid from test where (name,val) = ('bb',20))
and pid in (select pid from test where (name,val) = ('cc',30));



-- works on all RDBMS
select pid from test where (name,val) = ('aa',10)
and pid in (
    select pid from test where (name,val) = ('bb',20)
    and pid in (
        select pid from test where (name,val) = ('cc',30)
    )
);


-- works on most RDBMS, MySQL has no INTERSECT
select pid from test where (name,val) = ('aa',10)
intersect
select pid from test where (name,val) = ('bb',20)
intersect
select pid from test where (name,val) = ('cc',30);


-- works on all RDBMS
select a.pid
from   test a, test b, test c
where  (a.name,a.val) = ('aa',10) 
and    (b.name,b.val) = ('bb',20) 
and    (c.name,c.val) = ('cc',30) 
and    (a.pid = b.pid and b.pid = c.pid);

-- same as above. for JOIN purists
select a.pid
from   test a 
cross join test b
cross join test c
where  (a.name,a.val) = ('aa',10) 
and    (b.name,b.val) = ('bb',20) 
and    (c.name,c.val) = ('cc',30) 
and    (a.pid = b.pid and b.pid = c.pid);


-- just count
select t.pid
from test t
where (t.name, t.val) in ( ('aa', '10'), ('bb', '20'), ('cc', '30') )
group by t.pid
having count(*) = 3;                                                     


-- just sum
select t.pid
from test t
group by t.pid
having sum( ((t.name, t.val) in ( ('aa', '10'), ('bb', '20'), ('cc', '30') ))::int ) = 3                                                   


Output:
distinct and IN derived table
| pid |
| --- |
| 2   |
| 1   |


nested IN derived table
| pid |
| --- |
| 1   |
| 2   |


intersect
| pid |
| --- |
| 1   |
| 2   |


cross join using old syntax, table comma table
| pid |
| --- |
| 1   |
| 2   |

cross join
| pid |
| --- |
| 1   |
| 2   |

count
| pid |
| --- |
| 1   |
| 2   |


sum
| pid |
| --- |
| 1   |
| 2   |

All have same result, the first query's result is not ordered though.

Live test: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=3311663df90fd62f1194e50699767b0e

Execution plan: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=925dc71c30c9ef0f27bbc58f28b3bc33

Thursday, April 11, 2019

Not every RDBMS has every

Live test: https://www.db-fiddle.com/f/j1y7WCH2tRcWoE6QMbfRxC/6

Postgres supports the SQL Standard's every.

The query below shows the movie(s) that their genre is exactly Fantasy + Mystery. It won't show the movie if it is Fantasy + Mystery + Drama for example.

Here are the various ways every can be emulated in other RDBMSes.

select m.id, m.title, m.year
from movie m
join movie_genre mg on m.id = mg.movie_id
group by m.id
having 
    count(mg.genre) = 2 -- making sure that the movie has exactly two genres

    -- SQL Standard. Available in Postgres
    and every(mg.genre in ('Fantasy','Mystery'))
    -- Specific to Postgres
    and bool_and(mg.genre in ('Fantasy','Mystery'))

    
    -- MySQL can simulate every/bool_and by using BIT_AND. 
    -- Since MySQL's boolean behaves same as C's int, there's no need to explicitly cast the boolean expression to int. 
    -- Hence MySQL don't need to compare bit_and's result to 1 too.
    -- Following is how it is done in MySQL:
    
    -- and bit_and(mg.genre in ('Fantasy','Mystery'))

    
    
    --  Postgres has bit_and, but it's solely for bit manipulation. 
    --  No Postgres users would use bit_and for every/bool_and functionality, Postgres already has very English-like every/bool_and.
    --  If a user is intent to use bit_and instead of every/bool_and, the user has to cast the boolean expression to integer using ::int. 
    --  And also, bit_and need to be compared with a result:    

    and bit_and((mg.genre in ('Fantasy','Mystery'))::int) = 1


    -- Other RDBMSes
    and count(case when mg.genre in ('Fantasy','Mystery') then mg.genre end) = count(mg.genre)
    and min(case when mg.genre in ('Fantasy','Mystery') then 1 else 0 end) = 1
    and sum(case when mg.genre in ('Fantasy','Mystery') then 0 else 1 end) = 0

Output:
| id  | title                                    | year |
| --- | ---------------------------------------- | ---- |
| 125 | Harry Potter and the Prisoner of Azkaban | 2004 |


DDL:
CREATE TABLE movie (
  id INTEGER primary key,
  title VARCHAR(42),
  year INTEGER
);

INSERT INTO movie
  (id, title, year)
VALUES
  ('308', 'Coraline', '2009'),
  ('125', 'Harry Potter and the Prisoner of Azkaban', '2004'),
  ('204', 'Hugo', '2011'),
  ('42', 'Hitchiker''s guide to galaxy', '2011'),
  ('168', 'Lucky', '2011'),
  ('88', 'Nice', 1969);
  
  

CREATE TABLE movie_genre (
  movie_id INTEGER,
  genre VARCHAR(8)
);

INSERT INTO movie_genre
  (movie_id, genre)
VALUES
  ('308', 'Fantasy'),
  ('308', 'Thriller'),
  ('125', 'Fantasy'),
  ('125', 'Mystery'),
  ('204', 'Fantasy'),
  ('204', 'Mystery'),
  ('204', 'Drama'),
  ('308', 'Fantasy'),
  ('168', 'Fantasy'),
  ('88', 'Mystery'),
  ('88', 'Tour');  


Postgres-specific

Live test: https://www.db-fiddle.com/f/j1y7WCH2tRcWoE6QMbfRxC/2
select m.id, m.title, m.year
from movie m
join movie_genre mg on m.id = mg.movie_id
group by m.id
having 
    count(mg.genre) = 2 -- making sure that the movie has exactly two genres
    and every(mg.genre in ('Fantasy','Mystery'))    

Output:
| id  | title                                    | year |
| --- | ---------------------------------------- | ---- |
| 125 | Harry Potter and the Prisoner of Azkaban | 2004 |


To avoid explicit count (Postgres-specific)

Live test: https://www.db-fiddle.com/f/j1y7WCH2tRcWoE6QMbfRxC/7

select m.id, m.title, m.year
from movie m
join movie_genre mg on m.id = mg.movie_id
group by m.id
having array_agg(mg.genre order by mg.genre) = array['Fantasy', 'Mystery']

Output:
| id  | title                                    | year |
| --- | ---------------------------------------- | ---- |
| 125 | Harry Potter and the Prisoner of Azkaban | 2004 |

Note that we need to use order by mg.genre as array comparison is order-dependent. Without order by mg.genre, Harry Potter will not be shown.

Another good read on every: https://blog.jooq.org/2014/12/18/a-true-sql-gem-you-didnt-know-yet-the-every-aggregate-function/