Lists: | pgsql-hackers |
---|
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | missing optimization - column <> column |
Date: | 2016-12-05 15:18:19 |
Message-ID: | CAFj8pRD-q59wQ-0WvJXik_U-R4LYN9_Caw9yJqAT-vHX0SMwnQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi
I found some crazy queries in one customer application. These queries are
stupid, but it was surprise for me so there are not some simple optimization
create table foo(a int);
insert into foo select generate_series(1,100000);
analyze foo;
explain select * from foo where a <> a;
It does full scan of foo, although it should be replaced by false in
planner time.
Same issue is a expression a = a .. can be replaced by true
Regards
Pavel
From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:23:52 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Pavel,
* Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
>
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
>
> It does full scan of foo, although it should be replaced by false in
> planner time.
a <> a could go to NULL. Obviously, that'll be false for such a simple
case, but it might not work out that way in a more complicated WHERE
clause.
> Same issue is a expression a = a .. can be replaced by true
a = a can't be replaced unless you know that 'a' can't be NULL.
In short, fix the application.
Thanks!
Stephen
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:24:22 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I found some crazy queries in one customer application. These queries are
> stupid, but it was surprise for me so there are not some simple optimization
> create table foo(a int);
> insert into foo select generate_series(1,100000);
> analyze foo;
> explain select * from foo where a <> a;
> It does full scan of foo, although it should be replaced by false in
> planner time.
> Same issue is a expression a = a .. can be replaced by true
Wrong; those expressions yield NULL for NULL input. You could perhaps
optimize them slightly into some form of is-null test, but it hardly
seems worth the planner cycles to check for.
If you write something like "1 <> 1", it will be folded.
regards, tom lane
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:28:48 |
Message-ID: | CAFj8pRA3pun+SP1jr=Bj9E4t-sLqw7cEXK0vDXBEUFJphAqOjA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
2016-12-05 16:24 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
understand
>
> If you write something like "1 <> 1", it will be folded.
>
it works, but a <> a not
Regards
Pavel
>
> regards, tom lane
>
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:35:12 |
Message-ID: | CAFj8pRANQEB4rkOu1gEAcMegpqPmZNt5w+iDccVAR4qufzNcKw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> Pavel,
>
> * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
> >
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
> >
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> a <> a could go to NULL. Obviously, that'll be false for such a simple
> case, but it might not work out that way in a more complicated WHERE
> clause.
>
it should be false everywhere
I don't defend a design of the application - it is exactly wrong. But
sometimes, it can be generated by some tool or it can be a human error. And
bad performance can be a big problems on systems, where you cannot to
deploy fix simply.
It is hard to say what should be good design - because these queries are
slow, I know so these queries are wrong, but these queries does significant
IO utilization - and I cannot to fix the application, because I am not a
author and the fix will not be available in next week.
Regards
Pavel
>
> > Same issue is a expression a = a .. can be replaced by true
>
> a = a can't be replaced unless you know that 'a' can't be NULL.
>
> In short, fix the application.
>
> Thanks!
>
> Stephen
>
From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:41:22 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Pavel,
* Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> > * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > > I found some crazy queries in one customer application. These queries are
> > > stupid, but it was surprise for me so there are not some simple
> > optimization
> > >
> > > create table foo(a int);
> > > insert into foo select generate_series(1,100000);
> > > analyze foo;
> > > explain select * from foo where a <> a;
> > >
> > > It does full scan of foo, although it should be replaced by false in
> > > planner time.
> >
> > a <> a could go to NULL. Obviously, that'll be false for such a simple
> > case, but it might not work out that way in a more complicated WHERE
> > clause.
> >
>
> it should be false everywhere
No, it's NULL, not false, if 'a' is NULL:
=# SELECT 1 WHERE (NULL <> NULL) IS NULL;
?column?
----------
1
(1 row)
=*# SELECT 1 WHERE (FALSE) IS NULL;
?column?
----------
(0 rows)
You can not make the assumption that 'a <> a' is always false.
Thanks!
Stephen
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 15:46:59 |
Message-ID: | CAFj8pRCwSRZOJS-UNr6YmiQtbbPrDjaejvLm31yGVNWqzsUEZQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
2016-12-05 16:41 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> Pavel,
>
> * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > 2016-12-05 16:23 GMT+01:00 Stephen Frost <sfrost(at)snowman(dot)net>:
> > > * Pavel Stehule (pavel(dot)stehule(at)gmail(dot)com) wrote:
> > > > I found some crazy queries in one customer application. These
> queries are
> > > > stupid, but it was surprise for me so there are not some simple
> > > optimization
> > > >
> > > > create table foo(a int);
> > > > insert into foo select generate_series(1,100000);
> > > > analyze foo;
> > > > explain select * from foo where a <> a;
> > > >
> > > > It does full scan of foo, although it should be replaced by false in
> > > > planner time.
> > >
> > > a <> a could go to NULL. Obviously, that'll be false for such a simple
> > > case, but it might not work out that way in a more complicated WHERE
> > > clause.
> > >
> >
> > it should be false everywhere
>
> No, it's NULL, not false, if 'a' is NULL:
>
> =# SELECT 1 WHERE (NULL <> NULL) IS NULL;
> ?column?
> ----------
> 1
> (1 row)
>
> =*# SELECT 1 WHERE (FALSE) IS NULL;
> ?column?
> ----------
> (0 rows)
>
> You can not make the assumption that 'a <> a' is always false.
>
ok - when the expression is tested on NULL, then a <> a should not be
reduced. But when there are not this test, then I can replace this
expression by false.
Regards
Pavel
>
> Thanks!
>
> Stephen
>
From: | Pantelis Theodosiou <ypercube(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 16:04:30 |
Message-ID: | CAE3TBxyjpAXLpgLkZTVFt+95J7_e4TjXEKRYfXv1o-8daW+3uQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Dec 5, 2016 at 3:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple
> optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
> If you write something like "1 <> 1", it will be folded.
>
> regards, tom lane
>
>
>
Would it be worth replacing the condition with the equivalent?
I mean would that help optimizing better some queries when it knows that a
is (not) nullable or when "a" is more complicated expression?
a <> a : (a IS NULL) AND NULL
a = a : (a IS NOT NULL) OR NULL
Pantelis Theodosiou
From: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
---|---|
To: | Pantelis Theodosiou <ypercube(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 19:02:22 |
Message-ID: | CADkLM=ciBNo8Q2adv6VJBAX_2SHGd3pbggKaeFKnsM730mCZDw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
>
>
> Would it be worth replacing the condition with the equivalent?
> I mean would that help optimizing better some queries when it knows that a
> is (not) nullable or when "a" is more complicated expression?
>
> a <> a : (a IS NULL) AND NULL
> a = a : (a IS NOT NULL) OR NULL
>
I think you're looking for
a IS DISTINCT FROM a
And that will work for cases where a might be null.
I have no opinion about whether adding such a test to the planner is worth
it.
From: | Serge Rielau <serge(at)rielau(dot)com> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 20:59:18 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Actually there are lots of things that can be done with this sort of theorem proving.
And NULL is a plenty good answer for a filter, just not for a check constraint.
Amongst them INSERT through UNION ALL for symmetric views which can be handy for FDW partitioned tables.
One such implementation an be found here:
https://www.google.com/patents/US6728952 (apparently expired)
Cheers
Serge
> On Dec 5, 2016, at 7:28 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>
>
> 2016-12-05 16:24 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com <mailto:pavel(dot)stehule(at)gmail(dot)com>> writes:
> > I found some crazy queries in one customer application. These queries are
> > stupid, but it was surprise for me so there are not some simple optimization
>
> > create table foo(a int);
> > insert into foo select generate_series(1,100000);
> > analyze foo;
> > explain select * from foo where a <> a;
>
> > It does full scan of foo, although it should be replaced by false in
> > planner time.
>
> > Same issue is a expression a = a .. can be replaced by true
>
> Wrong; those expressions yield NULL for NULL input. You could perhaps
> optimize them slightly into some form of is-null test, but it hardly
> seems worth the planner cycles to check for.
>
> understand
>
>
> If you write something like "1 <> 1", it will be folded.
>
> it works, but a <> a not
>
> Regards
>
> Pavel
>
> regards, tom lane
>
From: | Pantelis Theodosiou <ypercube(at)gmail(dot)com> |
---|---|
To: | Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: missing optimization - column <> column |
Date: | 2016-12-05 20:59:19 |
Message-ID: | CAE3TBxyot9f1yEhmEaTpxN4+9rTBgRR_uX3cqcWegLNEFdvKaQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Dec 5, 2016 at 7:02 PM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:
>
>> Would it be worth replacing the condition with the equivalent?
>> I mean would that help optimizing better some queries when it knows that
>> a is (not) nullable or when "a" is more complicated expression?
>>
>> a <> a : (a IS NULL) AND NULL
>> a = a : (a IS NOT NULL) OR NULL
>>
>
>
> I think you're looking for
>
> a IS DISTINCT FROM a
>
>
> And that will work for cases where a might be null.
>
> I have no opinion about whether adding such a test to the planner is worth
> it.
>
No, (a IS DISTINCT FROM a) will evaluate to FALSE when a is NULL. The
other conditions (a <> a) , ((a IS NULL) AND NULL) will evaluate to NULL.