Re: missing optimization - column <> column

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: missing optimization - column <> column
Date: 2016-12-05 15:18:19
Message-ID: CAFj8pRD-q59wQ-0WvJXik_U-R4LYN9_Caw9yJqAT-vHX0SMwnQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi

I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization

create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;

It does full scan of foo, although it should be replaced by false in
planner time.

Same issue is a expression a = a .. can be replaced by true

Regards

Pavel


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:23:52
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Pavel,

* Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
>
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
>
> It does full scan of foo, although it should be replaced by false in
> planner time.

a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.

> Same issue is a expression a = a .. can be replaced by true

a = a can't be replaced unless you know that 'a' can't be NULL.

In short, fix the application.

Thanks!

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:24:22
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization

> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;

> It does full scan of foo, although it should be replaced by false in
> planner time.

> Same issue is a expression a = a .. can be replaced by true

Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.

If you write something like "1 <> 1", it will be folded.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:28:48
Message-ID: CAFj8pRA3pun+SP1jr=Bj9E4t-sLqw7cEXK0vDXBEUFJphAqOjA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-05 16:24 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>

understand

>
> If you write something like "1 <> 1", it will be folded.
>

it works, but a <> a not

Regards

Pavel

>
> regards, tom lane
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:35:12
Message-ID: CAFj8pRANQEB4rkOu1gEAcMegpqPmZNt5w+iDccVAR4qufzNcKw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:

> Pavel,
>
> * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
> >
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
> >
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> a <> a could go to NULL. Obviously, that'll be false for such a simple
> case, but it might not work out that way in a more complicated WHERE
> clause.
>

it should be false everywhere

I don't defend a design of the application - it is exactly wrong. But
sometimes, it can be generated by some tool or it can be a human error. And
bad performance can be a big problems on systems, where you cannot to
deploy fix simply.

It is hard to say what should be good design - because these queries are
slow, I know so these queries are wrong, but these queries does significant
IO utilization - and I cannot to fix the application, because I am not a
author and the fix will not be available in next week.

Regards

Pavel

>
> > Same issue is a expression a = a .. can be replaced by true
>
> a = a can't be replaced unless you know that 'a' can't be NULL.
>
> In short, fix the application.
>
> Thanks!
>
> Stephen
>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:41:22
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Pavel,

* Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> > * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > > I found some crazy queries in one customer application. These queries are
> > > stupid, but it was surprise for me so there are not some simple
> > optimization
> > >
> > > create table foo(a int);
> > > insert into foo select generate_series(1,100000);
> > > analyze foo;
> > > explain select * from foo where a <> a;
> > >
> > > It does full scan of foo, although it should be replaced by false in
> > > planner time.
> >
> > a <> a could go to NULL. Obviously, that'll be false for such a simple
> > case, but it might not work out that way in a more complicated WHERE
> > clause.
> >
>
> it should be false everywhere

No, it's NULL, not false, if 'a' is NULL:

=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
?column?
----------
1
(1 row)

=*# SELECT 1 WHERE (FALSE) IS NULL;
?column?
----------
(0 rows)

You can not make the assumption that 'a <> a' is always false.

Thanks!

Stephen


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 15:46:59
Message-ID: CAFj8pRCwSRZOJS-UNr6YmiQtbbPrDjaejvLm31yGVNWqzsUEZQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

2016-12-05 16:41 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:

> Pavel,
>
> * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> > > * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > > > I found some crazy queries in one customer application. These
> queries are
> > > > stupid, but it was surprise for me so there are not some simple
> > > optimization
> > > >
> > > > create table foo(a int);
> > > > insert into foo select generate_series(1,100000);
> > > > analyze foo;
> > > > explain select * from foo where a <> a;
> > > >
> > > > It does full scan of foo, although it should be replaced by false in
> > > > planner time.
> > >
> > > a <> a could go to NULL. Obviously, that'll be false for such a simple
> > > case, but it might not work out that way in a more complicated WHERE
> > > clause.
> > >
> >
> > it should be false everywhere
>
> No, it's NULL, not false, if 'a' is NULL:
>
> =# SELECT 1 WHERE (NULL <> NULL) IS NULL;
> ?column?
> ----------
> 1
> (1 row)
>
> =*# SELECT 1 WHERE (FALSE) IS NULL;
> ?column?
> ----------
> (0 rows)
>
> You can not make the assumption that 'a <> a' is always false.
>

ok - when the expression is tested on NULL, then a <> a should not be
reduced. But when there are not this test, then I can replace this
expression by false.

Regards

Pavel

>
> Thanks!
>
> Stephen
>


From: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 16:04:30
Message-ID: CAE3TBxyjpAXLpgLkZTVFt+95J7_e4TjXEKRYfXv1o-8daW+3uQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
> If you write something like "1 <> 1", it will be folded.
>
> regards, tom lane
>
>
>

Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a
is (not) nullable or when "a" is more complicated expression?

a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL

Pantelis Theodosiou


From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 19:02:22
Message-ID: CADkLM=ciBNo8Q2adv6VJBAX_2SHGd3pbggKaeFKnsM730mCZDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> Would it be worth replacing the condition with the equivalent?
> I mean would that help optimizing better some queries when it knows that a
> is (not) nullable or when "a" is more complicated expression?
>
> a <> a : (a IS NULL) AND NULL
> a = a : (a IS NOT NULL) OR NULL
>

I think you're looking for

a IS DISTINCT FROM a

And that will work for cases where a might be null.

I have no opinion about whether adding such a test to the planner is worth
it.


From: Serge Rielau <serge(at)rielau(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 20:59:18
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Actually there are lots of things that can be done with this sort of theorem proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy for FDW partitioned tables.

One such implementation an be found here:
https://www.google.com/patents/US6728952 (apparently expired)

Cheers
Serge

Salesforce.com


> On Dec 5, 2016, at 7:28 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>
>
> 2016-12-05 16:24 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com <mailto:pavel(dot)stehule(at)gmail(dot)com>> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
> understand
>
>
> If you write something like "1 <> 1", it will be folded.
>
> it works, but a <> a not
>
> Regards
>
> Pavel
>
> regards, tom lane
>


From: Pantelis Theodosiou <ypercube(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: missing optimization - column <> column
Date: 2016-12-05 20:59:19
Message-ID: CAE3TBxyot9f1yEhmEaTpxN4+9rTBgRR_uX3cqcWegLNEFdvKaQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:

>
>> Would it be worth replacing the condition with the equivalent?
>> I mean would that help optimizing better some queries when it knows that
>> a is (not) nullable or when "a" is more complicated expression?
>>
>> a <> a : (a IS NULL) AND NULL
>> a = a : (a IS NOT NULL) OR NULL
>>
>
>
> I think you're looking for
>
> a IS DISTINCT FROM a
>
>
> And that will work for cases where a might be null.
>
> I have no opinion about whether adding such a test to the planner is worth
> it.
>

No, (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The
other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.