Clarification of nodeToString() use cases

Lists: pgsql-hackers
From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Clarification of nodeToString() use cases
Date: 2018-09-16 11:43:05
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Hackers!

In the AQO project (Adaptive Query Optimization) [1] the nodeToString()
function is used by the planner to convert an query parse tree into a
string to generate a hash value [2].
In PostgreSQL v.11 call nodeToString(parse) segfaulted.
The reason is: parse tree node for XMLNAMESPACES clause has null pointer
in the case of DEFAULT namespace (the pointer will be initialized at
executor on the first call). Function _outValue() uses value->val.str[0]
[3] without checking of value->val.str.

I want to know, which of next options is correct:
1. Converting a parse tree into string with nodeToString() is illegal
operation. We need to add a comment to the description of nodeToString().
2. We can use nodeToString() for parse tree convertation. In this case
we need to check node variable 'value->val.str' to NULL pointer (Now I
use this approach, see attachment).

[1] https://github.com/postgrespro/aqo
[2] hash.c, line 55.
[3] outfuncs.c, line 3312.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-XML-Bug-fix.patch text/x-patch 826 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-16 15:21:58
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
> In the AQO project (Adaptive Query Optimization) [1] the nodeToString()
> function is used by the planner to convert an query parse tree into a
> string to generate a hash value [2].

Hmm. Not sure that is a bright idea; in fact, I'm pretty sure it's
a *bad* idea. This choice will result in the hash randomly varying
depending on whitespace, for instance. However ...

> In PostgreSQL v.11 call nodeToString(parse) segfaulted.

... that seems like a bad thing, because we commonly use nodeToString
(particularly pprint) to dump nodetrees for debugging purposes.

> The reason is: parse tree node for XMLNAMESPACES clause has null pointer
> in the case of DEFAULT namespace (the pointer will be initialized at
> executor on the first call).

Arguably, *that* is a bug. Or at least it requires some suspicion.

> Function _outValue() uses value->val.str[0]
> [3] without checking of value->val.str.

Indeed, because Value nodes aren't supposed to contain null strings.
I doubt this is the only place that would have a problem with that.

> I want to know, which of next options is correct:
> 1. Converting a parse tree into string with nodeToString() is illegal
> operation. We need to add a comment to the description of nodeToString().

Don't like this one too much because of the debug angle.

> 2. We can use nodeToString() for parse tree convertation. In this case
> we need to check node variable 'value->val.str' to NULL pointer (Now I
> use this approach, see attachment).

This patch is unacceptable because it turns outfuncs/readfuncs into
a non-idempotent transformation, ie a Value node would not read in
the same as it wrote out.

My immediate reaction is that somebody made a bad decision about how
to represent XMLNAMESPACES clauses. It's not quite too late to fix
that; but could you offer a concrete example that triggers this,
so it's clear what case we're talking about?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-16 18:05:27
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
>> The reason is: parse tree node for XMLNAMESPACES clause has null pointer
>> in the case of DEFAULT namespace (the pointer will be initialized at
>> executor on the first call).

> My immediate reaction is that somebody made a bad decision about how
> to represent XMLNAMESPACES clauses. It's not quite too late to fix
> that; but could you offer a concrete example that triggers this,
> so it's clear what case we're talking about?

I'd thought you were talking about the raw-parsetree representation,
but after poking at it, I see that the issue is with the post-parse-
analysis representation. That makes this not just a minor impediment
to debugging, but an easily reachable server crash, for example

regression=# CREATE VIEW bogus AS
SELECT * FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://x.y'),
'/rows/row'
PASSING '<rows xmlns="http://x.y"><row><a>10</a></row></rows>' COLUMNS a int PATH 'a');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Aside from stored rules not working, we'd also have a problem with
passing such parsetrees to parallel workers (though I'm not sure
whether RangeTableFunc is considered parallelizable). And you can
make it crash by turning on debug_print_parse, too.

The reason why we've not heard this reported is doubtless that
DEFAULT isn't really supported: if you get as far as execution,
you get

regression=# SELECT * FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://x.y'),
'/rows/row'
PASSING '<rows xmlns="http://x.y"><row><a>10</a></row></rows>'
COLUMNS a int PATH 'a');
ERROR: DEFAULT namespace is not supported

So I think that (a) we ought to fix the parsetree representation,
perhaps as attached, and (b) we ought to backpatch into v10 where this
syntax was introduced. Normally, this would be considered a change
of stored rules, forcing a catversion bump and hence non-backpatchable.
However, since existing releases would crash trying to store a rule
containing this construct, there can't be any stored rules out there
containing it, making the incompatibility moot.

The change the attached patch makes is to represent a DEFAULT namespace
as a NULL list entry, rather than a T_String Value node containing a
null. This approach does work for all backend/nodes/ operations, but
it could be argued that it's still a crash hazard for unsuspecting code.
We could do something else, like use a T_Null Value to represent DEFAULT,
but I'm doubtful that that's really much better. A NULL entry has the
advantage (or at least I'd consider it one) of being a certain crash
rather than a probabilistic crash for any uncorrected code accessing
the list item. Thoughts?

regards, tom lane

Attachment Content-Type Size
fix-default-namespace-representation-1.patch text/x-diff 4.8 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-16 19:32:27
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> The change the attached patch makes is to represent a DEFAULT
Tom> namespace as a NULL list entry, rather than a T_String Value node
Tom> containing a null. This approach does work for all backend/nodes/
Tom> operations, but it could be argued that it's still a crash hazard
Tom> for unsuspecting code. We could do something else, like use a
Tom> T_Null Value to represent DEFAULT, but I'm doubtful that that's
Tom> really much better. A NULL entry has the advantage (or at least
Tom> I'd consider it one) of being a certain crash rather than a
Tom> probabilistic crash for any uncorrected code accessing the list
Tom> item. Thoughts?

Seems reasonable to me.

Tom> + Value *ns_node = (Value *) lfirst(lc2);

lfirst_node(Value, lc2) maybe?

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-16 20:13:00
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> + Value *ns_node = (Value *) lfirst(lc2);

> lfirst_node(Value, lc2) maybe?

Unfortunately not: the node tag is not T_Value but T_String.

regards, tom lane


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-16 21:57:05
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 09/16/2018 02:05 PM, Tom Lane wrote:
> I wrote:
>> Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
>>> The reason is: parse tree node for XMLNAMESPACES clause has null pointer
>>> in the case of DEFAULT namespace (the pointer will be initialized at
>>> executor on the first call).
>> My immediate reaction is that somebody made a bad decision about how
>> to represent XMLNAMESPACES clauses. It's not quite too late to fix
>> that; but could you offer a concrete example that triggers this,
>> so it's clear what case we're talking about?
> I'd thought you were talking about the raw-parsetree representation,
> but after poking at it, I see that the issue is with the post-parse-
> analysis representation. That makes this not just a minor impediment
> to debugging, but an easily reachable server crash, for example
>
> regression=# CREATE VIEW bogus AS
> SELECT * FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row'
> PASSING '<rows xmlns="http://x.y"><row><a>10</a></row></rows>' COLUMNS a int PATH 'a');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
> Aside from stored rules not working, we'd also have a problem with
> passing such parsetrees to parallel workers (though I'm not sure
> whether RangeTableFunc is considered parallelizable). And you can
> make it crash by turning on debug_print_parse, too.
>
> The reason why we've not heard this reported is doubtless that
> DEFAULT isn't really supported: if you get as far as execution,
> you get
>
> regression=# SELECT * FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row'
> PASSING '<rows xmlns="http://x.y"><row><a>10</a></row></rows>'
> COLUMNS a int PATH 'a');
> ERROR: DEFAULT namespace is not supported
>
> So I think that (a) we ought to fix the parsetree representation,
> perhaps as attached, and (b) we ought to backpatch into v10 where this
> syntax was introduced. Normally, this would be considered a change
> of stored rules, forcing a catversion bump and hence non-backpatchable.
> However, since existing releases would crash trying to store a rule
> containing this construct, there can't be any stored rules out there
> containing it, making the incompatibility moot.
>
> The change the attached patch makes is to represent a DEFAULT namespace
> as a NULL list entry, rather than a T_String Value node containing a
> null. This approach does work for all backend/nodes/ operations, but
> it could be argued that it's still a crash hazard for unsuspecting code.
> We could do something else, like use a T_Null Value to represent DEFAULT,
> but I'm doubtful that that's really much better. A NULL entry has the
> advantage (or at least I'd consider it one) of being a certain crash
> rather than a probabilistic crash for any uncorrected code accessing
> the list item. Thoughts?
>
>

Seems related to this CF item that's been around for a year:
<https://www.postgresql.org/message-id/flat/CAFj8pRB%2BWDyDcZyGmfRdJ0HOoXugeaL-KNFeK9YA5Z10JN9qfA%40mail.gmail.com>
?

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-17 03:11:40
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On 09/16/2018 02:05 PM, Tom Lane wrote:
>> The change the attached patch makes is to represent a DEFAULT namespace
>> as a NULL list entry, rather than a T_String Value node containing a
>> null.

> Seems related to this CF item that's been around for a year:
> <https://www.postgresql.org/message-id/flat/CAFj8pRB%2BWDyDcZyGmfRdJ0HOoXugeaL-KNFeK9YA5Z10JN9qfA%40mail.gmail.com>
> ?

Hm, seems like that is operating at the next level down; it starts with
XmlTableSetNamespace's response to a null "name" argument, whereas what
I'm on about is what happens before we get to that function.

There's quite a bit I don't like about that patch now that I look at it
:-(, but I don't think it's relevant to this thread.

regards, tom lane


From: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-17 06:51:48
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

16.09.2018 23:05, Tom Lane writes:
>> Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> writes:
>>> The reason is: parse tree node for XMLNAMESPACES clause has null pointer
>>> in the case of DEFAULT namespace (the pointer will be initialized at
>>> executor on the first call).
>
>> My immediate reaction is that somebody made a bad decision about how
>> to represent XMLNAMESPACES clauses. It's not quite too late to fix
>> that; but could you offer a concrete example that triggers this,
>> so it's clear what case we're talking about?
> The change the attached patch makes is to represent a DEFAULT namespace
> as a NULL list entry, rather than a T_String Value node containing a
> null. This approach does work for all backend/nodes/ operations, but
> it could be argued that it's still a crash hazard for unsuspecting code.
> We could do something else, like use a T_Null Value to represent DEFAULT,
> but I'm doubtful that that's really much better. A NULL entry has the
> advantage (or at least I'd consider it one) of being a certain crash
> rather than a probabilistic crash for any uncorrected code accessing
> the list item. Thoughts?

You correctly defined the problem in more general formulation at your
next thread [1].
Thank you for this patch. May be it is not universal solution for
DEFAULT values, but now it works fine.
Note, however, that if we emphasize comments by DEBUG purpose of
nodeToString(), it can reduce the same mistakes and questions in the future.

[1] More deficiencies in outfuncs/readfuncs processing:
https://www.postgresql.org/message-id/[email protected]

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company


From: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Date: 2018-09-17 12:31:30
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On 09/16/2018 11:11 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
>> On 09/16/2018 02:05 PM, Tom Lane wrote:
>>> The change the attached patch makes is to represent a DEFAULT namespace
>>> as a NULL list entry, rather than a T_String Value node containing a
>>> null.
>> Seems related to this CF item that's been around for a year:
>> <https://www.postgresql.org/message-id/flat/CAFj8pRB%2BWDyDcZyGmfRdJ0HOoXugeaL-KNFeK9YA5Z10JN9qfA%40mail.gmail.com>
>> ?
> Hm, seems like that is operating at the next level down; it starts with
> XmlTableSetNamespace's response to a null "name" argument, whereas what
> I'm on about is what happens before we get to that function.
>
> There's quite a bit I don't like about that patch now that I look at it
> :-(, but I don't think it's relevant to this thread.
>
>

OK, good. Please do add your comments to the appropriate thread and
change the CF status if required.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services