| Lists: | pgsql-hackers |
|---|
| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | pg_class.reltuples of brin indexes |
| Date: | 2018-03-27 11:58:11 |
| Message-ID: | CAD21AoAhLJHzFemh7BZ22s8uMDPOm67mp5TpHEVbi1Mi_eG9Zw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Hi,
I found that pg_class.reltuples of brin indexes can be either the
number of index tuples or the number of heap tuples.
=# create table test as select generate_series(1,100000) as c;
=# create index test_brin on test using brin (c);
=# analyze test;
=# select relname, reltuples, relpages from pg_class where relname in
('test', 'test_brin');
relname | reltuples | relpages
-----------+-----------+----------
test | 100000 | 443
test_brin | 100000 | 3
(2 rows)
=# vacuum test;
=# select relname, reltuples, relpages from pg_class where relname in
('test', 'test_brin');
relname | reltuples | relpages
-----------+-----------+----------
test | 100000 | 443
test_brin | 3 | 3
(2 rows)
If I understand correctly pg_class.reltuples of indexes should have
the number of index tuples but especially for brin indexes it would be
hard to estimate it in the analyze code. I thought that we can change
brinvacuumcleanup so that it returns the estimated number of index
tuples and do vac_update_relstats using that value but it would break
API contract. Better ideas?
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
| From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2018-03-27 12:54:44 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On 03/27/2018 01:58 PM, Masahiko Sawada wrote:
> Hi,
>
> I found that pg_class.reltuples of brin indexes can be either the
> number of index tuples or the number of heap tuples.
>
> =# create table test as select generate_series(1,100000) as c;
> =# create index test_brin on test using brin (c);
> =# analyze test;
> =# select relname, reltuples, relpages from pg_class where relname in
> ('test', 'test_brin');
> relname | reltuples | relpages
> -----------+-----------+----------
> test | 100000 | 443
> test_brin | 100000 | 3
> (2 rows)
>
> =# vacuum test;
> =# select relname, reltuples, relpages from pg_class where relname in
> ('test', 'test_brin');
> relname | reltuples | relpages
> -----------+-----------+----------
> test | 100000 | 443
> test_brin | 3 | 3
> (2 rows)
>
Ouch!
> If I understand correctly pg_class.reltuples of indexes should have
> the number of index tuples but especially for brin indexes it would
> be hard to estimate it in the analyze code.
I'm not sure it's that clear, unfortunately - it's probably more a
question of how the value is used for costing, etc.
> I thought that we can change brinvacuumcleanup so that it returns the
> estimated number of index tuples and do vac_update_relstats using
> that value but it would break API contract. Better ideas?
>
I think number of index tuples makes sense, as long as that's what the
costing needs. That is, it's up to the index AM to define it. But it
clearly should not flap like this ...
And it's not just BRIN. This is what I get with a GIN index:
archie=# create index on messages using gin(subject_tsvector);
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+-------------
messages_subject_tsvector_idx | 6.58566e+06
(1 row)
archie=# vacuum messages;
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+-------------
messages_subject_tsvector_idx | 6.58566e+06
(1 row)
archie=# analyze messages;
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+-------------
messages_subject_tsvector_idx | 1.23463e+06
(1 row)
And it's even worse with a partial index:
archie=# create index on messages using gin(subject_tsvector)
where lower(substr(subject, 0, 4)) <> 're:'::text;
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+------------
messages_subject_tsvector_idx | 1.4397e+06
(1 row)
archie=# vacuum messages;
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+------------
messages_subject_tsvector_idx | 1.4397e+06
(1 row)
archie=# analyze messages;
archie=# select relname, reltuples from pg_class
where relname = 'messages_subject_tsvector_idx';
relname | reltuples
-------------------------------+-----------
messages_subject_tsvector_idx | 295107
(1 row)
The good thing is that in this case VACUUM/ANALYZE don't flap, it's just
the initial reltuples estimate set by CREATE INDEX.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
| Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2018-03-27 14:24:52 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> I think number of index tuples makes sense, as long as that's what the
> costing needs. That is, it's up to the index AM to define it. But it
> clearly should not flap like this ...
> And it's not just BRIN. This is what I get with a GIN index:
Sounds like the same kind of thing we just fixed for SP-GiST :-(
regards, tom lane
| From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2018-03-27 14:28:03 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> > I think number of index tuples makes sense, as long as that's what the
> > costing needs. That is, it's up to the index AM to define it. But it
> > clearly should not flap like this ...
>
> > And it's not just BRIN. This is what I get with a GIN index:
>
> Sounds like the same kind of thing we just fixed for SP-GiST :-(
Most likely I modelled the BRIN code after GIN.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2018-03-29 05:48:40 |
| Message-ID: | CAD21AoAA7+ETUJo=j2L8KAdKF8Q9_5uqwNx6H8rucFm6aRZSBA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Tue, Mar 27, 2018 at 11:28 PM, Alvaro Herrera
<alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Tom Lane wrote:
>> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> > I think number of index tuples makes sense, as long as that's what the
>> > costing needs. That is, it's up to the index AM to define it. But it
>> > clearly should not flap like this ...
>>
>> > And it's not just BRIN. This is what I get with a GIN index:
>>
>> Sounds like the same kind of thing we just fixed for SP-GiST :-(
>
> Most likely I modelled the BRIN code after GIN.
>
It's better to create a new index AM that estimates the number of
index tuples, and to update the index stats using that returned value?
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
| From: | Bruce Momjian <bruce(at)momjian(dot)us> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2023-11-21 20:48:36 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Tue, Mar 27, 2018 at 08:58:11PM +0900, Masahiko Sawada wrote:
> Hi,
>
> I found that pg_class.reltuples of brin indexes can be either the
> number of index tuples or the number of heap tuples.
>
> =# create table test as select generate_series(1,100000) as c;
> =# create index test_brin on test using brin (c);
> =# analyze test;
> =# select relname, reltuples, relpages from pg_class where relname in
> ('test', 'test_brin');
> relname | reltuples | relpages
> -----------+-----------+----------
> test | 100000 | 443
> test_brin | 100000 | 3
> (2 rows)
>
> =# vacuum test;
> =# select relname, reltuples, relpages from pg_class where relname in
> ('test', 'test_brin');
> relname | reltuples | relpages
> -----------+-----------+----------
> test | 100000 | 443
> test_brin | 3 | 3
> (2 rows)
>
> If I understand correctly pg_class.reltuples of indexes should have
> the number of index tuples but especially for brin indexes it would be
> hard to estimate it in the analyze code. I thought that we can change
> brinvacuumcleanup so that it returns the estimated number of index
> tuples and do vac_update_relstats using that value but it would break
> API contract. Better ideas?
I assume there is nothing to do on this issue.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
| From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
|---|---|
| To: | Bruce Momjian <bruce(at)momjian(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2023-12-31 01:08:41 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On 11/21/23 21:48, Bruce Momjian wrote:
> On Tue, Mar 27, 2018 at 08:58:11PM +0900, Masahiko Sawada wrote:
>> ...
>>
>> If I understand correctly pg_class.reltuples of indexes should have
>> the number of index tuples but especially for brin indexes it would be
>> hard to estimate it in the analyze code. I thought that we can change
>> brinvacuumcleanup so that it returns the estimated number of index
>> tuples and do vac_update_relstats using that value but it would break
>> API contract. Better ideas?
>
> I assume there is nothing to do on this issue.
>
I'm not sure. I think the current behavior is (still) wrong - I just
rediscovered it during testing BRIN. I haven't checked, but I guess GIN
is still affected too.
What's not clear to me is if this is merely cosmetic issue (making
pg_class data confusing for people), or if it has some practical impact.
And I'm not sure there's a good way to improve this, except for some
basic guesswork. For BRIN I can imagine simply calculating the number of
page ranges (relpages / pages_per_range), but no idea about GIN.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| From: | Michael Paquier <michael(at)paquier(dot)xyz> |
|---|---|
| To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
| Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: pg_class.reltuples of brin indexes |
| Date: | 2023-12-31 05:35:48 |
| Message-ID: | [email protected] |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Lists: | pgsql-hackers |
On Sun, Dec 31, 2023 at 02:08:41AM +0100, Tomas Vondra wrote:
> I'm not sure. I think the current behavior is (still) wrong - I just
> rediscovered it during testing BRIN. I haven't checked, but I guess GIN
> is still affected too.
>
> What's not clear to me is if this is merely cosmetic issue (making
> pg_class data confusing for people), or if it has some practical impact.
> And I'm not sure there's a good way to improve this, except for some
> basic guesswork. For BRIN I can imagine simply calculating the number of
> page ranges (relpages / pages_per_range), but no idea about GIN.
FWIW, this area of the code rings a few bells:
https://www.postgresql.org/message-id/[email protected]
https://www.postgresql.org/message-id/[email protected]
--
Michael