BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.

Lists: pgsql-bugspgsql-hackers
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: thusson(at)informiciel(dot)com
Subject: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 18:22:20
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 15840
Logged by: Thierry Husson
Email address: thusson(at)informiciel(dot)com
PostgreSQL version: 12beta1
Operating system: Ubuntu 18.04.2 LTS
Description:

I was doing tables COPY between my old server with PG10.8 and the new one
with 12Beta1. After each table is done, I make a vacuum of it.
However PG12 has stopped working for wraparound protection. I was doing it
on around 10 cpu, 1 table by cpu.
The is the end of the log of the copy program in Python3 with Psycopg2:
...
2019-06-06 23:15:26 prog_sync Vacuum
usr_ops.prg_hrdps_n1500n_voisin_ade_metar... 4s.
2019-06-06 23:15:30 prog_sync Vacuum
usr_ops.flt_hrdps_n1500n_voisin_ade_metar... 0s.
2019-06-06 23:15:30 prog_sync CPU 0 - Sync done 8857sec.
2019-06-06 23:15:30 prog_sync Tables Skipped:0, Already sync:0, Copied
from pravda:1. Copied from zhen:0.
2019-06-06 23:15:30 prog_sync Sync done for 1 tables of 106451311 records in
8858s. (12018 rec./sec.)

Traceback (most recent call last):
File "/home/semt700/emet/script/prog_sync.py", line 316, in syncTable
ioResult = e.flushCopyBuffer(ioResult, curPG[slave][procId],
progTable[slave], columns)
File "/fs/home/fs1/eccc/cmd/cmdn/semt700/emet/script/emetlib.py", line
607, in flushCopyBuffer
cursorObj.copy_from(ioBuffer, tableName, sep='\t', columns=columnName,
null='NULL')
psycopg2.OperationalError: database is not accepting commands to avoid
wraparound data loss in database "emet_zhen"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.
CONTEXT: SQL statement "INSERT INTO
usr_ops.prg_gdps_g1610n_voisin_ade_synop_swob_metar_201903 SELECT $1.*"
PL/pgSQL function prog_insert() line 17 at EXECUTE
COPY prg_gdps_g1610n_voisin_ade_synop_swob_metar, line 132822: "284532738
2019-03-20 00:00:00 2019-03-29 12:00:00 11011 37980000
-101750000 75597472 NULL -5.4617 1 ..."

I did a DB shutdown and started a vacuum with:
postgres --single emet_zhen
VACUUM FREEZE VERBOSE;

It worked a few hours and when I was thinking it was done as nothing was
loggin anymore, I made a ctrl-\ and restarted the DB.
I was still getting wraparound protection messages so I shutdown the DB
again & redo the vacuum command but it didn't work anymore:
zhen:semt700 $ postgres --single emet_zhen
2019-06-07 17:23:36 UTC 7251 WARNING: database with OID 16394 must be
vacuumed within 999995 transactions
2019-06-07 17:23:36 UTC 7251 HINT: To avoid a database shutdown, execute a
database-wide VACUUM in that database.
You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.
PostgreSQL stand-alone backend 12beta1
backend> VACUUM VERBOSE;
2019-06-07 17:23:59 UTC 7251 WARNING: database "emet_zhen" must be
vacuumed within 999995 transactions
2019-06-07 17:23:59 UTC 7251 HINT: To avoid a database shutdown, execute a
database-wide VACUUM in that database.
You might also need to commit or roll back old prepared
transactions, or drop stale replication slots.
2019-06-07 17:23:59 UTC 7251 LOG: duration: 2417.639 ms statement: VACUUM
VERBOSE;

I tried with various options but none worked. It also tried to restard the
DB and use vacuumdb --all -v , or various options, but always get the same
message for each table:

INFO: aggressively vacuuming "pg_catalog.pg_publication"
INFO: index "pg_publication_oid_index" now contains 0 row versions in 1
pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "pg_publication_pubname_index" now contains 0 row versions in 1
pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_publication": found 0 removable, 0 nonremovable row versions in 0
out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin:
2146520116
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
WARNING: database "emet_zhen" must be vacuumed within 999995 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that
database.
You might also need to commit or roll back old prepared transactions, or
drop stale replication slots.

I out of clues of what to try next. I already got this situation with PG 9.x
& PG10.x but system wide in exclusive mode usualy worked.

Seems like a PG12 bug that will certainly prevent us from upgrading even if
the new fonctionnalities look really great.

Thanks a lot!

Thierry


From: Andres Freund <andres(at)anarazel(dot)de>
To: thusson(at)informiciel(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 19:02:31
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-06-07 18:22:20 +0000, PG Bug reporting form wrote:
> I was doing tables COPY between my old server with PG10.8 and the new one
> with 12Beta1. After each table is done, I make a vacuum of it.
> However PG12 has stopped working for wraparound protection. I was doing it
> on around 10 cpu, 1 table by cpu.

That was a new postgres 12 cluster, not a pg_upgraded one? And you just
did a bunch of COPYs? How many?

I'm not clear as to how the cluster got to wraparound if that's the
scenario. We use one xid per transaction, and copy doesn't use multiple
transactions internally. Any chance you have triggers on these tables
that use savepoints internally?

> postgres --single emet_zhen
> VACUUM FREEZE VERBOSE;

Don't FREEZE in wraparound cases, that just makes it take longer.

> It worked a few hours and when I was thinking it was done as nothing was
> loggin anymore, I made a ctrl-\ and restarted the DB.
> I was still getting wraparound protection messages so I shutdown the DB
> again & redo the vacuum command but it didn't work anymore:

> zhen:semt700 $ postgres --single emet_zhen
> 2019-06-07 17:23:36 UTC 7251 WARNING: database with OID 16394 must be
> vacuumed within 999995 transactions
> 2019-06-07 17:23:36 UTC 7251 HINT: To avoid a database shutdown, execute a
> database-wide VACUUM in that database.
> You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.
> PostgreSQL stand-alone backend 12beta1
> backend> VACUUM VERBOSE;
> 2019-06-07 17:23:59 UTC 7251 WARNING: database "emet_zhen" must be
> vacuumed within 999995 transactions
> 2019-06-07 17:23:59 UTC 7251 HINT: To avoid a database shutdown, execute a
> database-wide VACUUM in that database.
> You might also need to commit or roll back old prepared
> transactions, or drop stale replication slots.
> 2019-06-07 17:23:59 UTC 7251 LOG: duration: 2417.639 ms statement: VACUUM
> VERBOSE;

What do you mean by "didn't work anymore"? As far as I can tell the
VACUUM here succeeded?

> HINT: To avoid a database shutdown, execute a database-wide VACUUM in that
> database.
> You might also need to commit or roll back old prepared transactions, or
> drop stale replication slots.

Did you check whether any of these are the case?

SELECT * FROM pg_replication_slots;
SELECT * FROM pg_prepared_xacts;

Could you also show

SELECT oid, datname, datfrozenxid, age(datfrozenxid), datminmxid, mxid_age(datminmxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;
SELECT * FROM pg_control_checkpoint();

Greetings,

Andres Freund


From: Thierry Husson <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 19:59:11
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi Andres,

Thank you for your anwser. Precisions bellow:

Andres Freund <andres(at)anarazel(dot)de> a écrit :

> Hi,
>
> On 2019-06-07 18:22:20 +0000, PG Bug reporting form wrote:
>> I was doing tables COPY between my old server with PG10.8 and the new one
>> with 12Beta1. After each table is done, I make a vacuum of it.
>> However PG12 has stopped working for wraparound protection. I was doing it
>> on around 10 cpu, 1 table by cpu.
>
> That was a new postgres 12 cluster, not a pg_upgraded one? And you just
> did a bunch of COPYs? How many?
>
> I'm not clear as to how the cluster got to wraparound if that's the
> scenario. We use one xid per transaction, and copy doesn't use multiple
> transactions internally. Any chance you have triggers on these tables
> that use savepoints internally?

Yes it was a new cluster. Around 30 copy were done.
Yes there is a trigger to manage partitions. Around 1200 tables were
created. 10 billions records transfered, I need to tranfert 180BR over
1700 tables.
I just realize I made vacuum on partitions for the first 8BR rows and
forgot for the last 2BR That would explain the wraparound protection.

>
>
>> postgres --single emet_zhen
>> VACUUM FREEZE VERBOSE;
>
> Don't FREEZE in wraparound cases, that just makes it take longer.
>
>
>> It worked a few hours and when I was thinking it was done as nothing was
>> loggin anymore, I made a ctrl-\ and restarted the DB.
>> I was still getting wraparound protection messages so I shutdown the DB
>> again & redo the vacuum command but it didn't work anymore:
>
>> zhen:semt700 $ postgres --single emet_zhen
>> 2019-06-07 17:23:36 UTC 7251 WARNING: database with OID 16394 must be
>> vacuumed within 999995 transactions
>> 2019-06-07 17:23:36 UTC 7251 HINT: To avoid a database shutdown, execute a
>> database-wide VACUUM in that database.
>> You might also need to commit or roll back old prepared
>> transactions, or drop stale replication slots.
>> PostgreSQL stand-alone backend 12beta1
>> backend> VACUUM VERBOSE;
>> 2019-06-07 17:23:59 UTC 7251 WARNING: database "emet_zhen" must be
>> vacuumed within 999995 transactions
>> 2019-06-07 17:23:59 UTC 7251 HINT: To avoid a database shutdown, execute a
>> database-wide VACUUM in that database.
>> You might also need to commit or roll back old prepared
>> transactions, or drop stale replication slots.
>> 2019-06-07 17:23:59 UTC 7251 LOG: duration: 2417.639 ms statement: VACUUM
>> VERBOSE;
>
> What do you mean by "didn't work anymore"? As far as I can tell the
> VACUUM here succeeded?
>
>
>> HINT: To avoid a database shutdown, execute a database-wide VACUUM in that
>> database.
>> You might also need to commit or roll back old prepared transactions, or
>> drop stale replication slots.
>
> Did you check whether any of these are the case?
>
> SELECT * FROM pg_replication_slots;
> SELECT * FROM pg_prepared_xacts;
These are empty.

emet_zhen=# SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 *
100.0 AS "Percentage of transaction ID's used" FROM pg_database;
Percentage of transaction ID's used
-------------------------------------
99.953434057533740997000

>
> Could you also show
>
> SELECT oid, datname, datfrozenxid, age(datfrozenxid), datminmxid,
> mxid_age(datminmxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;
oid | datname | datfrozenxid | age | datminmxid | mxid_age
-------+-----------+--------------+------------+------------+----------
16394 | emet_zhen | 36464 | 2146483652 | 1 | 0
12672 | template0 | 504982897 | 1641537219 | 1 | 0
12673 | postgres | 2096520116 | 50000000 | 1 | 0
1 | template1 | 2096520116 | 50000000 | 1 | 0

> SELECT * FROM pg_control_checkpoint();
checkpoint_lsn | redo_lsn | redo_wal_file |
timeline_id | prev_timeline_id | full_page_writes | next_xid |
next_oid | next_multixact_id | next_multi_offset | oldest_xid |
oldest_xid_dbid | oldest_active_xid | oldest_multi_xid |
oldest_multi_dbid | oldest_commit_ts_xid | newest_commit_ts_xid |
checkpoint_time
32D/54074EC0 | 32D/54074E88 | 000000010000032D00000054 |
1 | 1 | t | 0:2146520116 | 475782 |
1 | 0 | 36464 | 16394 |
2146520116 | 1 | 16394 |
0 | 0 | 2019-06-07 18:11:39+00
(1 row)

Could it be that PG12 considers "vacuum" as a transaction and trigger
wraparound protection against it?

>
> Greetings,
>
> Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thierry Husson <thusson(at)informiciel(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 20:10:02
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-06-07 14:59:11 -0500, Thierry Husson wrote:
> Thank you for your anwser. Precisions bellow:
> Andres Freund <andres(at)anarazel(dot)de> a écrit :
> > On 2019-06-07 18:22:20 +0000, PG Bug reporting form wrote:
> > > I was doing tables COPY between my old server with PG10.8 and the new one
> > > with 12Beta1. After each table is done, I make a vacuum of it.
> > > However PG12 has stopped working for wraparound protection. I was doing it
> > > on around 10 cpu, 1 table by cpu.
> >
> > That was a new postgres 12 cluster, not a pg_upgraded one? And you just
> > did a bunch of COPYs? How many?
> >
> > I'm not clear as to how the cluster got to wraparound if that's the
> > scenario. We use one xid per transaction, and copy doesn't use multiple
> > transactions internally. Any chance you have triggers on these tables
> > that use savepoints internally?
>
> Yes it was a new cluster. Around 30 copy were done.
> Yes there is a trigger to manage partitions. Around 1200 tables were
> created. 10 billions records transfered, I need to tranfert 180BR over 1700
> tables.
> I just realize I made vacuum on partitions for the first 8BR rows and forgot
> for the last 2BR That would explain the wraparound protection.

Do those triggers use savepoints / EXCEPTION handling?

Might be worthwhile to check - independent of this issue - if you still
need the partition handling via trigger, now that pg's builtin
partitioning can handle COPY (and likely *much* faster).

> > Could you also show
> >
> > SELECT oid, datname, datfrozenxid, age(datfrozenxid), datminmxid,
> > mxid_age(datminmxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;
> oid | datname | datfrozenxid | age | datminmxid | mxid_age
> -------+-----------+--------------+------------+------------+----------
> 16394 | emet_zhen | 36464 | 2146483652 | 1 | 0

Ok, so it's xids, and clearly not multixids. Could you connect to
emet_zhen and show the output of:

SELECT oid, oid::regclass, relkind, relfrozenxid, age(relfrozenxid) FROM pg_class WHERE relfrozenxid <> 0 AND age(relfrozenxid) > 1800000000 ORDER BY age(relfrozenxid) DESC;

that will tell us which relations need to be vacuumed, and then we can
see why that doesn't work.

> Could it be that PG12 considers "vacuum" as a transaction and trigger
> wraparound protection against it?

I'm still somewhat confused - the output you showed didn't include
vacuum failing, as far as I can tell?

- Andres


From: Thierry Husson <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 21:40:27
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Thanks again Andres,

Andres Freund <andres(at)anarazel(dot)de> a écrit :

> Hi,
>
> On 2019-06-07 14:59:11 -0500, Thierry Husson wrote:
>> Thank you for your anwser. Precisions bellow:
>> Andres Freund <andres(at)anarazel(dot)de> a écrit :
>> > On 2019-06-07 18:22:20 +0000, PG Bug reporting form wrote:
>> > > I was doing tables COPY between my old server with PG10.8 and
>> the new one
>> > > with 12Beta1. After each table is done, I make a vacuum of it.
>> > > However PG12 has stopped working for wraparound protection. I
>> was doing it
>> > > on around 10 cpu, 1 table by cpu.
>> >
>> > That was a new postgres 12 cluster, not a pg_upgraded one? And you just
>> > did a bunch of COPYs? How many?
>> >
>> > I'm not clear as to how the cluster got to wraparound if that's the
>> > scenario. We use one xid per transaction, and copy doesn't use multiple
>> > transactions internally. Any chance you have triggers on these tables
>> > that use savepoints internally?
>>
>> Yes it was a new cluster. Around 30 copy were done.
>> Yes there is a trigger to manage partitions. Around 1200 tables were
>> created. 10 billions records transfered, I need to tranfert 180BR over 1700
>> tables.
>> I just realize I made vacuum on partitions for the first 8BR rows and forgot
>> for the last 2BR That would explain the wraparound protection.
>
> Do those triggers use savepoints / EXCEPTION handling?
>
> Might be worthwhile to check - independent of this issue - if you still
> need the partition handling via trigger, now that pg's builtin
> partitioning can handle COPY (and likely *much* faster).

Yes, those triggers use exception handling (if partition doesn't
exist, create it) but no savepoint.
Thanks for the suggestion, I take that in note!

>> > Could you also show
>> >
>> > SELECT oid, datname, datfrozenxid, age(datfrozenxid), datminmxid,
>> > mxid_age(datminmxid) FROM pg_database ORDER BY age(datfrozenxid) DESC;
>> oid | datname | datfrozenxid | age | datminmxid | mxid_age
>> -------+-----------+--------------+------------+------------+----------
>> 16394 | emet_zhen | 36464 | 2146483652 | 1 | 0
>
> Ok, so it's xids, and clearly not multixids. Could you connect to
> emet_zhen and show the output of:
>
> SELECT oid, oid::regclass, relkind, relfrozenxid, age(relfrozenxid)
> FROM pg_class WHERE relfrozenxid <> 0 AND age(relfrozenxid) >
> 1800000000 ORDER BY age(relfrozenxid) DESC;
> that will tell us which relations need to be vacuumed, and then we can
> see why that doesn't work.
>> Could it be that PG12 considers "vacuum" as a transaction and trigger
>> wraparound protection against it?
>
> I'm still somewhat confused - the output you showed didn't include
> vacuum failing, as far as I can tell?
>
> - Andres

oid | oid | relkind |
relfrozenxid | age
--------+--------------------------------------+---------+--------------+------------
460564 | pg_temp_3.cur_semt700_progsync_4996 | r |
36464 | 2146483652
460764 | pg_temp_8.cur_semt700_progsync_5568 | r |
19836544 | 2126683572
460718 | pg_temp_4.cur_semt700_progsync_5564 | r |
19836544 | 2126683572
460721 | pg_temp_5.cur_semt700_progsync_5565 | r |
19836544 | 2126683572
461068 | pg_temp_22.cur_semt700_progsync_5581 | r |
19836544 | 2126683572

These are temporary tables to manage concurrency & server load. It
seems the sudden disconnection due to wraparound protection didn't get
them removed. I removed them manually under single mode and there is
no more warning now, vacuum command included. Your command is very
interesting to know.

It annoying PG create a xId for empty temporary tables. You can't
clear it with a vacuum as there is no record. I have to terminate
connexions of my deamon processes daily to avoid wraparound
protection. Is there a way to tell PG to forget these tables on its
age estimation?

Thank you so much Andres! You saved me!

Thierry


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thierry Husson <thusson(at)informiciel(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 21:47:47
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-06-07 16:40:27 -0500, Thierry Husson wrote:
> oid | oid | relkind | relfrozenxid |
> age
> --------+--------------------------------------+---------+--------------+------------
> 460564 | pg_temp_3.cur_semt700_progsync_4996 | r | 36464 |
> 2146483652
> 460764 | pg_temp_8.cur_semt700_progsync_5568 | r | 19836544 |
> 2126683572
> 460718 | pg_temp_4.cur_semt700_progsync_5564 | r | 19836544 |
> 2126683572
> 460721 | pg_temp_5.cur_semt700_progsync_5565 | r | 19836544 |
> 2126683572
> 461068 | pg_temp_22.cur_semt700_progsync_5581 | r | 19836544 |
> 2126683572
>
> These are temporary tables to manage concurrency & server load. It seems the
> sudden disconnection due to wraparound protection didn't get them removed. I
> removed them manually under single mode and there is no more warning now,
> vacuum command included. Your command is very interesting to know.

Hm. But you do have autovacuum enabled, is that right? If enabled, have
you tuned it at all? It seems quite possible that given your load (10
parallel loads), the default settings werent aggressive enough.

> It annoying PG create a xId for empty temporary tables. You can't clear it
> with a vacuum as there is no record. I have to terminate connexions of my
> deamon processes daily to avoid wraparound protection. Is there a way to
> tell PG to forget these tables on its age estimation?

Normally postgres would drop such "orphaned" temp tables on its own, in
autovacuum (triggering it when close to a wraparound, even if
disabled). But if it can't keep up for some reason, then that's not
necessarily good enough with very rapid xid usage as you seem to have.

I'll start a thread about this subtopic on -hackers.

Greetings,

Andres Freund


From: Thierry Husson <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 22:49:52
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi Andres,

Andres Freund <andres(at)anarazel(dot)de> a écrit :

> Hi,
>
> On 2019-06-07 16:40:27 -0500, Thierry Husson wrote:
>> oid | oid | relkind | relfrozenxid |
>> age
>> --------+--------------------------------------+---------+--------------+------------
>> 460564 | pg_temp_3.cur_semt700_progsync_4996 | r | 36464 |
>> 2146483652
>> 460764 | pg_temp_8.cur_semt700_progsync_5568 | r | 19836544 |
>> 2126683572
>> 460718 | pg_temp_4.cur_semt700_progsync_5564 | r | 19836544 |
>> 2126683572
>> 460721 | pg_temp_5.cur_semt700_progsync_5565 | r | 19836544 |
>> 2126683572
>> 461068 | pg_temp_22.cur_semt700_progsync_5581 | r | 19836544 |
>> 2126683572
>>
>> These are temporary tables to manage concurrency & server load. It seems the
>> sudden disconnection due to wraparound protection didn't get them removed. I
>> removed them manually under single mode and there is no more warning now,
>> vacuum command included. Your command is very interesting to know.
>
> Hm. But you do have autovacuum enabled, is that right? If enabled, have
> you tuned it at all? It seems quite possible that given your load (10
> parallel loads), the default settings werent aggressive enough.

Yes autovacuum is enabled. Aggressiveness was effectively a recent
problem I had and putting its max_worker to 8 wasn't a solution, there
were all busy 24/7 and I had to do a daily script to help it. The
solution was to push vacuum_cost_limit to 2000, since then it works
like a charm. Another issue was autovaccuums were taking the lock over
my running vacuums, making them waiting for 5 days instead of taking
around 1 hour. I could do another post on that but it's not PG12
specific, I have it with 10.x

>> It annoying PG create a xId for empty temporary tables. You can't clear it
>> with a vacuum as there is no record. I have to terminate connexions of my
>> deamon processes daily to avoid wraparound protection. Is there a way to
>> tell PG to forget these tables on its age estimation?
>
> Normally postgres would drop such "orphaned" temp tables on its own, in
> autovacuum (triggering it when close to a wraparound, even if
> disabled). But if it can't keep up for some reason, then that's not
> necessarily good enough with very rapid xid usage as you seem to have.
>
> I'll start a thread about this subtopic on -hackers.
> Greetings,
>
> Andres Freund

What is the link to this forum? I'm very very interested to follow
that subtopic & I could make some tests if necessary.

Have a great weekend & thanks for your time :)

Thierry


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thierry Husson <thusson(at)informiciel(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-07 22:58:43
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

(Moving a part of this discussion to hackers)

In #15840 Thierry had the situation that autovacuum apparently could not
keep up, and he ended up with a wraparound situation. Following the
hints and shutting down the cluster and vacuuming the relevant DB in
single user mode did not resolve the issue however. That's because there
was a session with temp tables:

On 2019-06-07 16:40:27 -0500, Thierry Husson wrote:
> oid | oid | relkind | relfrozenxid |
> age
> --------+--------------------------------------+---------+--------------+------------
> 460564 | pg_temp_3.cur_semt700_progsync_4996 | r | 36464 |
> 2146483652
> 460764 | pg_temp_8.cur_semt700_progsync_5568 | r | 19836544 |
> 2126683572
> 460718 | pg_temp_4.cur_semt700_progsync_5564 | r | 19836544 |
> 2126683572
> 460721 | pg_temp_5.cur_semt700_progsync_5565 | r | 19836544 |
> 2126683572
> 461068 | pg_temp_22.cur_semt700_progsync_5581 | r | 19836544 |
> 2126683572
>
> These are temporary tables to manage concurrency & server load. It seems the
> sudden disconnection due to wraparound protection didn't get them removed. I
> removed them manually under single mode and there is no more warning now,
> vacuum command included. Your command is very interesting to know.

And our logic for dropping temp tables only kicks in autovacuum, but not
in a database manual VACUUM.

Which means that currently the advice we give, namely to shut down and
vacuum the database in singleuser mode plainly doesn't work. Without any
warnings hinting in the right direction.

Do we need to move the orphan temp cleanup code into database vacuums or
such?

Greetings,

Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thierry Husson <thusson(at)informiciel(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-07 23:01:03
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-06-07 17:49:52 -0500, Thierry Husson wrote:
> Andres Freund <andres(at)anarazel(dot)de> a écrit :
> > I'll start a thread about this subtopic on -hackers.

> What is the link to this forum? I'm very very interested to follow that
> subtopic & I could make some tests if necessary.

It's now (was interrupted by something else) at:
https://postgr.es/m/20190607225843.z73jqqyy6hhc6qnp%40alap3.anarazel.de

and you're CCed in the discussion.

Have a nice weekend as well!

Andres


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thierry Husson <thusson(at)informiciel(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-07 23:59:37
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Jun 07, 2019 at 03:58:43PM -0700, Andres Freund wrote:
> Do we need to move the orphan temp cleanup code into database vacuums or
> such?

When entering into the vacuum() code path for an autovacuum, only one
relation at a time is processed, and we have prior that extra
processing related to toast relations when selecting the relations to
work on, or potentially delete orphaned temp tables. For a manual
vacuum, we finish by deciding which relation to process in
get_all_vacuum_rels(), so the localized processing is a bit different
than what's done in do_autovacuum() when scanning pg_class for
relations.

Technically, I think that it would work to give up on the gathering of
the orphaned OIDs in a gathering and let them be gathered in the list
of items to vacuum, and then put the deletion logic down to
vacuum_rel(). However, there is a take: for autovacuum we gather the
orphaned entries and the other relations to process, then drop all the
orphaned OIDs, and finally vacuum/analyze the entries collected. So
if you put the deletion logic down into vacuum_rel() then we won't be
able to drop orphaned tables before working on a database, which would
be bad if we know about an orphaned set, but autovacuum works for a
long time on other legit entries first.
--
Michael


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Thierry Husson <thusson(at)informiciel(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15840: Vacuum does not work after database stopped for wraparound protection. Database seems unrepearable.
Date: 2019-06-08 00:25:42
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2019-Jun-07, Thierry Husson wrote:

> Yes autovacuum is enabled. Aggressiveness was effectively a recent problem I
> had and putting its max_worker to 8 wasn't a solution, there were all busy
> 24/7 and I had to do a daily script to help it. The solution was to push
> vacuum_cost_limit to 2000, since then it works like a charm.

Note the I/O cost balancing thing, which seems to bite many people: if
you raise max_workers without changing cost_delay or cost_limit, it
doesn't have much of an effect, because each worker goes slower to
accomodate. Raising the cost limit (or lowering the cost delay) does
have a useful impact. In pg12 we changed the default cost_delay to 2ms
(from 20ms).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thierry Husson <thusson(at)informiciel(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-08 00:26:32
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-06-08 08:59:37 +0900, Michael Paquier wrote:
> On Fri, Jun 07, 2019 at 03:58:43PM -0700, Andres Freund wrote:
> > Do we need to move the orphan temp cleanup code into database vacuums or
> > such?
>
> When entering into the vacuum() code path for an autovacuum, only one
> relation at a time is processed, and we have prior that extra
> processing related to toast relations when selecting the relations to
> work on, or potentially delete orphaned temp tables. For a manual
> vacuum, we finish by deciding which relation to process in
> get_all_vacuum_rels(), so the localized processing is a bit different
> than what's done in do_autovacuum() when scanning pg_class for
> relations.

Yea, I know. I didn't mean that we should only handle orphan cleanup
only within database wide vacuums, just *also* there. ISTM that'd mean
that at least some of the code ought to be in vacuum.c, and then also
called by autovacuum.c.

> Technically, I think that it would work to give up on the gathering of
> the orphaned OIDs in a gathering and let them be gathered in the list
> of items to vacuum, and then put the deletion logic down to
> vacuum_rel().

I don't think it makes much sense to go there. The API would probably
look pretty bad.

I was more thinking that we'd move the check for orphaned-ness into a
separate function (maybe IsOrphanedRelation()), and move the code to
drop orphan relations into a separate function (maybe
DropOrphanRelations()). That'd limit the amount of code duplication for
doing this both in autovacuum and all-database vacuums quite
considerably.

A more aggressive approach would be to teach vac_update_datfrozenxid()
to ignore orphaned temp tables - perhaps even by heap_inplace'ing an
orphaned table's relfrozenxid/relminmxid with InvalidTransactionId. We'd
not want to do that in do_autovacuum() - otherwise the schema won't get
cleaned up, but for database widevacuums that seems like it could be
good approach.

Random observation while re-reading this code: Having do_autovacuum()
and ExecVacuum() both go through vacuum() seems like it adds too much
complexity to be worth it. Like half of the file is only concerned with
checks related to that.

Greetings,

Andres Freund


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thierry Husson <thusson(at)informiciel(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-08 01:45:40
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Jun 07, 2019 at 05:26:32PM -0700, Andres Freund wrote:
> I was more thinking that we'd move the check for orphaned-ness into a
> separate function (maybe IsOrphanedRelation()), and move the code to
> drop orphan relations into a separate function (maybe
> DropOrphanRelations()). That'd limit the amount of code duplication for
> doing this both in autovacuum and all-database vacuums quite
> considerably.

A separation makes sense. At some point we should actually try to
separate vacuum and orphan relation cleanup, so separate functions
make sense. The only reason why we are doing it with autovacuum is
that it is the only thing in-core spawning a worker connected to a
database which does a full scan of pg_class.
--
Michael


From: Thierry Husson <thusson(at)informiciel(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-08 09:06:39
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I like the approach proposed by Andres: A more aggressive approach
would be to teach vac_update_datfrozenxid() to ignore orphaned temp
tables... In fact, I suppose all temporary tables and their content
could be completly ignored by MVCC principles as they are not subject
to concurrency being unmodifiable/unreadable by other connections.

That would solve a major problem I have because I automaticaly create
an empty temporary table for each connection in each DB process to
manage users' activities/system resources. Even when everything goes
well, these tables take age as long as they exists, even if I
explicitly vacuum them, frozen or not. So any connection kept open for
a long time will finish by causing a anti-wraparound shutdown. For now
the only solution I have is to kill my deamons connections every day.

I suppose this could be tested by a simple PSQL left open after a
CREATE TEMP TABLE toto (a INT). Any vacuum can't reduce its age.

The separate situation, as noted by Michael, could be done at
connection time, when PG gives a temporay schema to it. When it create
a pg_temp_XXX schema, it could make sure it's completely empty and
otherwise remove everything in it. I already had a DB corruption
because system tables weren't in sync about these tables/schemas after
a badly closed connection, so it was impossible to make a drop table
on them. So it could be even safer to clear everything directly from
system tables instead of calling drop table for each leftover temp
table.

Thierry

Michael Paquier <michael(at)paquier(dot)xyz> a écrit :

> On Fri, Jun 07, 2019 at 05:26:32PM -0700, Andres Freund wrote:
>> I was more thinking that we'd move the check for orphaned-ness into a
>> separate function (maybe IsOrphanedRelation()), and move the code to
>> drop orphan relations into a separate function (maybe
>> DropOrphanRelations()). That'd limit the amount of code duplication for
>> doing this both in autovacuum and all-database vacuums quite
>> considerably.
>
> A separation makes sense. At some point we should actually try to
> separate vacuum and orphan relation cleanup, so separate functions
> make sense. The only reason why we are doing it with autovacuum is
> that it is the only thing in-core spawning a worker connected to a
> database which does a full scan of pg_class.
> --
> Michael


From: Andres Freund <andres(at)anarazel(dot)de>
To: Thierry Husson <thusson(at)informiciel(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-09 21:31:26
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

(on postgres lists, please do not top-quote).

On 2019-06-08 04:06:39 -0500, Thierry Husson wrote:
> In fact, I suppose all temporary tables and their content could be
> completly ignored by MVCC principles as they are not subject to
> concurrency being unmodifiable/unreadable by other connections.

That'd cause corruption, because vacuum would then remove resources that
the temp table might rely on (commit log, multixacts, ...).

> The separate situation, as noted by Michael, could be done at connection
> time, when PG gives a temporay schema to it. When it create a pg_temp_XXX
> schema, it could make sure it's completely empty and otherwise remove
> everything in it.

That already happens, but unfortunately only too late. IIRC We only do
so once the first temp table in a session is created.

> I already had a DB corruption because system tables weren't in sync
> about these tables/schemas after a badly closed connection, so it was
> impossible to make a drop table on them. So it could be even safer to
> clear everything directly from system tables instead of calling drop
> table for each leftover temp table.

Hm, I'd like to know more about that corruption. Did you report it when
it occured?

Greetings,

Andres Freund


From: - - <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-10 16:47:42
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
</head><body><table style="text-align: center; line-height: 1em; margin: 0.5em auto;" border="0" cellspacing="0" cellpadding="0" class="mce-item-table"><tbody><tr><td style="border-bottom: 1px solid #aaaaaa; padding: 0.4em 1em; border-top: 1px solid #555555;"><table style="text-align: center; line-height: 1em; margin: 0.5em auto;" border="0" cellspacing="0" cellpadding="0" bgcolor="#5c99dc" class="mce-item-table"><tbody><tr><td style="border-bottom: 1px solid #aaaaaa; padding: 0.4em 1em; border-top: 1px solid #555555;"><table style="text-align: center; line-height: 1em; margin: 0.5em auto;" border="0" cellspacing="0" cellpadding="0" class="mce-item-table"><tbody><tr bgcolor="#ffffff"><td style="border-bottom: 1px solid #aaaaaa; padding: 0.4em 1em; border-top: 1px solid #555555;"><table style="text-align: left; line-height: 1em; margin: 0.5em auto;" border="0" cellspacing="0" cellpadding="0" class="mce-item-table"><tbody><tr><td style="text-align: left; border-bottom: 1px solid #555555; padding: 0.4em 1em; border-top: 1px solid #555555;"><div class="bodyclass"><div>&#62; Hm, I&#39;d like to know more about that corruption. Did you report it when</div><div>&#62; it occured?</div><div><br></div><div>Thanks Andres for explanations. I didn&#39;t reported the corruption when it</div><div>occured as it was my fault, not a PG bug, as the main cause was that I was using<span style="background-color: transparent;">&#160;network drive, knowingly it&#39;s unreliable for DB but management didn&#39;t</span></div><div>believe me.</div><div>I had these kind of errors:</div><div>pg_dump emet_istina -F c -n usr_...</div><div>pg_dump: schema with OID 308991 does not exist</div><div>\dt+ pg_temp*.*</div><div>ERROR:&#160; catalog is missing 1 attribute(s) for relid 5733555</div><div>drop schema pg_temp_9;</div><div>ERROR:&#160; cache lookup failed for relation 5733715</div><div>drop schema pg_temp_6;</div><div>ERROR:&#160; cannot drop schema pg_temp_6 because other objects depend on it</div><div>DETAIL:&#160; table pg_temp_6.cur_dde000_105577 depends on schema pg_temp_6</div><div>HINT:&#160; Use DROP ... CASCADE to drop the dependent objects too.</div><div>I had to manualy remove/edit records from pg_class, pg_type, pg_namespace,</div><div>pg_depend, pg_shdepend.</div><div>I finaly managed to make it works and could dump everything and rebuild the</div><div>DB for more security. Server was down for 1 week, and that event gave me</div><div>proven arguments to have local storage. That was with 9.6 and I took the opportunity<span style="background-color: transparent;">&#160;to upgrade to 10.3 at the same time.</span></div><div>Now it&#39;s more clear it&#39;s a PG9/10/12 (I didn&#39;t tried 11) problem with</div><div>vacuum/autovacuum not changing xid on temp tables. So, as long a temp table</div><div>exists, it take age and finish by causing a wraparound protection.</div><div>Thierry</div></div></td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table></body></html>

Attachment Content-Type Size
unknown_filename text/html 3.0 KB

From: - - <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-10 18:22:41
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

<!DOCTYPE html>
<html><head>
<meta charset="UTF-8">
</head><body><p class="default-style">Sorry for previous messup.</p><p class="default-style">&#62; Hm, I&#39;d like to know more about that corruption. Did you report it when</p><p class="default-style">&#62; it occured?</p><p class="default-style">Thanks Andres for explanations. I didn&#39;t reported the corruption when it</p><p class="default-style">occured as it was my fault, not a PG bug, as the main cause was that I was</p><p class="default-style">using network drive, knowingly it&#39;s unreliable for DB but management didn&#39;t</p><p class="default-style">believe me.</p><p class="default-style">I had these kind of errors:</p><p class="default-style">pg_dump emet_istina -F c -n usr_...</p><p class="default-style">pg_dump: schema with OID 308991 does not exist</p><p class="default-style">\dt+ pg_temp*.*</p><p class="default-style">ERROR:&#160; catalog is missing 1 attribute(s) for relid 5733555</p><p class="default-style">drop schema pg_temp_9;</p><p class="default-style">ERROR:&#160; cache lookup failed for relation 5733715</p><p class="default-style">drop schema pg_temp_6;</p><p class="default-style">ERROR:&#160; cannot drop schema pg_temp_6 because other objects depend on it</p><p class="default-style">DETAIL:&#160; table pg_temp_6.cur_dde000_105577 depends on schema pg_temp_6</p><p class="default-style">HINT:&#160; Use DROP ... CASCADE to drop the dependent objects too.</p><p class="default-style">I had to manualy remove/edit records from pg_class, pg_type, pg_namespace,</p><p class="default-style">pg_depend, pg_shdepend.</p><p class="default-style">I finaly managed to make it works and could dump everything and rebuild the</p><p class="default-style">DB for more security. Server was down for 1 week, and that event gave me</p><p class="default-style">proven arguments to have local storage. That was with 9.6 and I took the</p><p class="default-style">opportunity to upgrade to 10.3 at the same time.</p><p class="default-style">Now it&#39;s more clear it&#39;s a PG9/10/12 (I didn&#39;t tried 11) problem with</p><p class="default-style">vacuum/autovacuum not changing xid on temp tables. So, as long a temp table</p><p class="default-style">exists, it take age and finish by causing a wraparound protection.</p><p class="default-style">Thierry</p></body></html>

Attachment Content-Type Size
unknown_filename text/html 2.3 KB

From: Thierry Husson <thusson(at)informiciel(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-10 23:45:38
Message-ID: [email protected]
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

> Hm, I'd like to know more about that corruption. Did you report it when
> it occured?
>
> Greetings,
>
> Andres Freund

Thanks Andres for explanations, sorry for my previous mess. I didn't
reported the corruption when it occured as it was my fault, not a PG
bug, as the main cause was that I was using network drive, knowingly
it's unreliable for DB but management didn't believe me.

I had these kind of errors:

pg_dump emet_istina -F c -n usr_...
pg_dump: schema with OID 308991 does not exist

\dt+ pg_temp*.*
ERROR: catalog is missing 1 attribute(s) for relid 5733555

drop schema pg_temp_9;
ERROR: cache lookup failed for relation 5733715

drop schema pg_temp_6;
ERROR: cannot drop schema pg_temp_6 because other objects depend on it
DETAIL: table pg_temp_6.cur_dde000_105577 depends on schema pg_temp_6
HINT: Use DROP ... CASCADE to drop the dependent objects too.

I had to manualy remove/edit records from pg_class, pg_type, pg_namespace,
pg_depend, pg_shdepend.

I finaly managed to make it works and could dump everything and
rebuild the DB for more security. Server was down for 1 week, and that
event gave me proven arguments to have local storage. That was with
9.6 and I took the opportunity to upgrade to 10.3 at the same time.

Now it's more clear it's a PG9/10/12 problem (didn't tried 11) with
vacuum/autovacuum not changing xid on temp tables. So, as long a temp
table exists, it take age and finish by causing a wraparound protection.

Thierry


From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thierry Husson <thusson(at)informiciel(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Temp table handling after anti-wraparound shutdown (Was: BUG #15840)
Date: 2019-06-11 12:08:36
Message-ID: CAD21AoDVqebWy6dOWV5iNsh8O_CT2ZDAGHQcX-W13OAW+XC2tA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jun 8, 2019 at 9:26 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>
> A more aggressive approach would be to teach vac_update_datfrozenxid()
> to ignore orphaned temp tables - perhaps even by heap_inplace'ing an
> orphaned table's relfrozenxid/relminmxid with InvalidTransactionId. We'd
> not want to do that in do_autovacuum() - otherwise the schema won't get
> cleaned up, but for database widevacuums that seems like it could be
> good approach.
>

FWIW I like this approach. We don't need to calculate new datfrozenxid
while including orphaned temp tables. It both improves behavior and
fixes this issue. Also with that approach we will not need stop
database cluster and do vacuuming in single user mode. The making the
vacuum command cleanup orphaned temp tables would be helpful in the
case where we reached to wraparound while having active temp tables,
it doesn't happen in normal use case though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center