Quick Links

GiST buffering build, bug in levelStep calculation

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	GiST buffering build, bug in levelStep calculation
Date:	2012-05-29 19:13:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

I just noticed a little bug in the way the levelStep parameter is
calculated when we switch to buffering build:

set client_min_messages='debug2';

This works:

postgres=# set maintenance_work_mem='1GB';
SET
postgres=# create index i_gisttest on gisttest using gist (t collate
"C") WITH (fillfactor=10);
DEBUG: building index "i_gisttest" on table "gisttest"
DEBUG: switched to buffered GiST build; level step = 2, pagesPerBuffer
= 232

But with higher maintenance_work_mem, it fails to switch to buffering mode:

postgres=# set maintenance_work_mem='2GB';
SET
postgres=# create index i_gisttest on gisttest using gist (t collate
"C") WITH (fillfactor=10);
DEBUG: building index "i_gisttest" on table "gisttest"
DEBUG: failed to switch to buffered GiST build

This is because of this rather complex calculation here:

> while (
> /* subtree must fit in cache (with safety factor of 4) */
> (1 - pow(avgIndexTuplesPerPage, (double) (levelStep + 1))) / (1 - avgIndexTuplesPerPage) < effective_cache_size / 4
> &&
> /* each node in the lowest level of a subtree has one page in memory */
> (pow(maxIndexTuplesPerPage, (double) levelStep) < (maintenance_work_mem * 1024) / BLCKSZ)
> )

What happens here is that the calculation of (maintenance_work_mem *
1024) / BLCKSZ overflows the 32 bit signed integer type used there, if
maintenance_work_mem >= 2GB. I think we should be doing all these
calculations in double.

I'll go fix that..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-29 19:42:30
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> This is because of this rather complex calculation here:

>> while (
>> /* subtree must fit in cache (with safety factor of 4) */
>> (1 - pow(avgIndexTuplesPerPage, (double) (levelStep + 1))) / (1 - avgIndexTuplesPerPage) < effective_cache_size / 4
>> &&
>> /* each node in the lowest level of a subtree has one page in memory */
>> (pow(maxIndexTuplesPerPage, (double) levelStep) < (maintenance_work_mem * 1024) / BLCKSZ)
>> )

> What happens here is that the calculation of (maintenance_work_mem *
> 1024) / BLCKSZ overflows the 32 bit signed integer type used there, if
> maintenance_work_mem >= 2GB. I think we should be doing all these
> calculations in double.

Given the use of pow(), I concur with changing the whole calculation to
double. Just as a remark, the correct way to code that sort of thing
normally is (maintenance_work_mem * 1024L) / BLCKSZ, which avoids
overflow because guc.c limits MAX_KILOBYTES to ensure it won't overflow
a long. (Given that we are now supporting Win64 where long is narrower
than size_t, we might want to revisit this coding rule eventually, but
it's not high on my priority list.)

While I'm looking at this, is the first test involving
effective_cache_size bulletproof either? In particular, is
avgIndexTuplesPerPage clamped to be strictly greater than 1?

And for that matter, is either test sane from a units standpoint?
It seems to me that maxIndexTuplesPerPage would have units of
1/blocks, which is pretty dubious to be comparing to a block count
even disregarding the power function.

regards, tom lane

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-29 20:20:50
Message-ID:	CAPpHfds9r4VAu=4te8wCMT4_kuPxDhGfsdJ7rcHqkkrzM0wgXw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 29, 2012 at 11:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> While I'm looking at this, is the first test involving
> effective_cache_size bulletproof either? In particular, is
> avgIndexTuplesPerPage clamped to be strictly greater than 1?
>

It's based on collected statistics on already inserted tuple sizes. Since
tuple sizes are measured after possible toasting, I don't see the way
for avgIndexTuplesPerPage to be less than 1.

> And for that matter, is either test sane from a units standpoint?
> It seems to me that maxIndexTuplesPerPage would have units of
> 1/blocks, which is pretty dubious to be comparing to a block count
> even disregarding the power function.
>

In this test we use avgIndexTuplesPerPage as estimate for number of child
index pages of one page. I think we can assume it to have blocks unit.

------
With best regards,
Alexander Korotkov.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-29 20:25:02
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Tue, May 29, 2012 at 11:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> While I'm looking at this, is the first test involving
>> effective_cache_size bulletproof either? In particular, is
>> avgIndexTuplesPerPage clamped to be strictly greater than 1?

> It's based on collected statistics on already inserted tuple sizes. Since
> tuple sizes are measured after possible toasting, I don't see the way
> for avgIndexTuplesPerPage to be less than 1.

Yeah, but if it could be *equal* to one, you've got a zero-divide there.

regards, tom lane

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-29 20:46:11
Message-ID:	CAPpHfduK=enE+M2jRob5hvoLUtow=8DQY0dtNmRVrhc-PJEtXQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 30, 2012 at 12:25 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > On Tue, May 29, 2012 at 11:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> While I'm looking at this, is the first test involving
> >> effective_cache_size bulletproof either? In particular, is
> >> avgIndexTuplesPerPage clamped to be strictly greater than 1?
>
> > It's based on collected statistics on already inserted tuple sizes. Since
> > tuple sizes are measured after possible toasting, I don't see the way
> > for avgIndexTuplesPerPage to be less than 1.
>
> Yeah, but if it could be *equal* to one, you've got a zero-divide there.
>

avgIndexTuplesPerPage is calculated as:

avgIndexTuplesPerPage = pageFreeSpace / itupAvgSize;

I think size of each index tuple must be at least few times lower
than pageFreeSpace to let us create any index.

------
With best regards,
Alexander Korotkov.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-31 09:36:04
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29.05.2012 23:46, Alexander Korotkov wrote:
> On Wed, May 30, 2012 at 12:25 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Alexander Korotkov<aekorotkov(at)gmail(dot)com> writes:
>>> On Tue, May 29, 2012 at 11:42 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> While I'm looking at this, is the first test involving
>>>> effective_cache_size bulletproof either? In particular, is
>>>> avgIndexTuplesPerPage clamped to be strictly greater than 1?
>>
>>> It's based on collected statistics on already inserted tuple sizes. Since
>>> tuple sizes are measured after possible toasting, I don't see the way
>>> for avgIndexTuplesPerPage to be less than 1.
>>
>> Yeah, but if it could be *equal* to one, you've got a zero-divide there.
>>
>
> avgIndexTuplesPerPage is calculated as:
>
> avgIndexTuplesPerPage = pageFreeSpace / itupAvgSize;
>
> I think size of each index tuple must be at least few times lower
> than pageFreeSpace to let us create any index.

Hmm, in theory, it seems possible that every leaf level index tuple
would completely fill an index page. Not sure how useful such an index
would be, though. On internal pages, at least, you have to fit at least
two tuples on a page or you can't build a tree.

I note that the calculations assume that leaf tuples and internal tuples
have similar sizes. We calculate the average leaf tuple size, and use
that to calculate the fan-out of internal pages. On some GiST opclasses,
the values stored on internal pages might be quite different from the
leaf tuples. I don't think we need to worry about that in practice,
these calculations are not very accurate anyway, but perhaps a comment
would be in order.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST buffering build, bug in levelStep calculation
Date:	2012-05-31 10:34:51
Message-ID:	CAPpHfdvnUuuQoze9NZt6AnThaJiP2a770HO5EozaPH=n3ZGbzQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 31, 2012 at 1:36 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I note that the calculations assume that leaf tuples and internal tuples
> have similar sizes. We calculate the average leaf tuple size, and use that
> to calculate the fan-out of internal pages. On some GiST opclasses, the
> values stored on internal pages might be quite different from the leaf
> tuples. I don't think we need to worry about that in practice, these
> calculations are not very accurate anyway, but perhaps a comment would be
> in order.
>

Probably we could collect per-level statistics of average tuple size?

------
With best regards,
Alexander Korotkov.