Lists: | pgsql-hackers |
---|
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 17:53:42 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello,
fd.c[1] will remove files from pgsql_tmp on a restart but not a
crash-restart per this comment:
/*
* NOTE: we could, but don't, call this during a post-backend-crash restart
* cycle. The argument for not doing it is that someone might want to
examine
* the temp files for debugging purposes. This does however mean that
* OpenTemporaryFile had better allow for collision with an existing temp
* file name.
*/
I understand that this is designed this way. I think it is a bad idea
because:
1. The majority crash-restarts in the wild are going to be diagnosed
rather easily within the OS itself. They fall into things like OOM
killer and out of disk space.
2. It can cause significant issues, we ran into this yesterday:
-bash-4.1$ ls pgsql_tmp31227*|du -sh
250G
There is no active process/backend with PID 31227. The database itself
is only 55G, but we are taking up an 5x that with dead files.
3. The problem can get worse over time. If you have a very long running
instance, any time the backend crash-restarts you have to potential to
increase disk space used for no purpose.
Sincerely,
JD
P.S. Thanks to AndrewG for his assistance in finding this.
1. http://doxygen.postgresql.org/fd_8c_source.html
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 17:56:29 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2016-03-16 10:53:42 -0700, Joshua D. Drake wrote:
> Hello,
>
> fd.c[1] will remove files from pgsql_tmp on a restart but not a
> crash-restart per this comment:
>
> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle. The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes. This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */
>
> I understand that this is designed this way. I think it is a bad idea
> because:
>
> 1. The majority crash-restarts in the wild are going to be diagnosed rather
> easily within the OS itself. They fall into things like OOM killer and out
> of disk space.
I don't buy 1), like at all. I've seen numerous production instances
with crashes outside of os triggered things.
> 2. It can cause significant issues, we ran into this yesterday:
>
> -bash-4.1$ ls pgsql_tmp31227*|du -sh
> 250G
>
> There is no active process/backend with PID 31227. The database itself is
> only 55G, but we are taking up an 5x that with dead files.
>
> 3. The problem can get worse over time. If you have a very long running
> instance, any time the backend crash-restarts you have to potential to
> increase disk space used for no purpose.
But I think these outweigh the debugging benefit.
Andres
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:02:09 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03/16/2016 10:56 AM, Andres Freund wrote:
>> I understand that this is designed this way. I think it is a bad idea
>> because:
>>
>> 1. The majority crash-restarts in the wild are going to be diagnosed rather
>> easily within the OS itself. They fall into things like OOM killer and out
>> of disk space.
>
> I don't buy 1), like at all. I've seen numerous production instances
> with crashes outside of os triggered things.
I don't argue that. I argue that 9 times out of 10, it will not be one
of those things but ....
>> 3. The problem can get worse over time. If you have a very long running
>> instance, any time the backend crash-restarts you have to potential to
>> increase disk space used for no purpose.
>
> But I think these outweigh the debugging benefit.
Well as Andrew said, we could also create postmaster start option that
defaults to don't save.
JD
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:04:00 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
> >>3. The problem can get worse over time. If you have a very long running
> >>instance, any time the backend crash-restarts you have to potential to
> >>increase disk space used for no purpose.
> >
> >But I think these outweigh the debugging benefit.
>
> Well as Andrew said, we could also create postmaster start option that
> defaults to don't save.
I think these days you'd simply use restart_after_crash = false. For
debugging I found that to be rather valuable.
Andres
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:05:13 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> Hello,
> fd.c[1] will remove files from pgsql_tmp on a restart but not a
> crash-restart per this comment:
> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle. The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes. This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */
> I understand that this is designed this way. I think it is a bad idea
> because:
Possible compromise: remove files only in non-Assert builds?
regards, tom lane
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:06:23 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03/16/2016 11:04 AM, Andres Freund wrote:
> On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
>>>> 3. The problem can get worse over time. If you have a very long running
>>>> instance, any time the backend crash-restarts you have to potential to
>>>> increase disk space used for no purpose.
>>>
>>> But I think these outweigh the debugging benefit.
>>
>> Well as Andrew said, we could also create postmaster start option that
>> defaults to don't save.
>
> I think these days you'd simply use restart_after_crash = false. For
> debugging I found that to be rather valuable.
That would have created an extended outage for this installation. I am
not sure we can force that for something that should not happen in the
first place (filling up the hard drive with dead files).
JD
>
>
> Andres
>
>
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:07:06 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03/16/2016 11:05 AM, Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> Hello,
>> fd.c[1] will remove files from pgsql_tmp on a restart but not a
>> crash-restart per this comment:
>
>> /*
>> * NOTE: we could, but don't, call this during a post-backend-crash restart
>> * cycle. The argument for not doing it is that someone might want to
>> examine
>> * the temp files for debugging purposes. This does however mean that
>> * OpenTemporaryFile had better allow for collision with an existing temp
>> * file name.
>> */
>
>> I understand that this is designed this way. I think it is a bad idea
>> because:
>
> Possible compromise: remove files only in non-Assert builds?
Oh, that seems absolutely reasonable. If we are using assert builds it
is because we are debugging something (or should be) anyway.
Sincerely,
JD
>
> regards, tom lane
>
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:08:52 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2016-03-16 11:06:23 -0700, Joshua D. Drake wrote:
> On 03/16/2016 11:04 AM, Andres Freund wrote:
> >On 2016-03-16 11:02:09 -0700, Joshua D. Drake wrote:
> >>>>3. The problem can get worse over time. If you have a very long running
> >>>>instance, any time the backend crash-restarts you have to potential to
> >>>>increase disk space used for no purpose.
> >>>
> >>>But I think these outweigh the debugging benefit.
> >>
> >>Well as Andrew said, we could also create postmaster start option that
> >>defaults to don't save.
> >
> >I think these days you'd simply use restart_after_crash = false. For
> >debugging I found that to be rather valuable.
>
> That would have created an extended outage for this installation. I am not
> sure we can force that for something that should not happen in the first
> place (filling up the hard drive with dead files).
Nah, I meant that if you want to debug something you'd set that (i.e. in
the case you need the temp files), not you should set that.
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 18:15:18 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 03/16/2016 11:08 AM, Andres Freund wrote:
>>>> Well as Andrew said, we could also create postmaster start option that
>>>> defaults to don't save.
>>>
>>> I think these days you'd simply use restart_after_crash = false. For
>>> debugging I found that to be rather valuable.
>>
>> That would have created an extended outage for this installation. I am not
>> sure we can force that for something that should not happen in the first
>> place (filling up the hard drive with dead files).
>
> Nah, I meant that if you want to debug something you'd set that (i.e. in
> the case you need the temp files), not you should set that.
Aww, yes that would be reasonable as well.
JD
--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 19:14:02 |
Message-ID: | CA+TgmoZwGNFnng25xkJbqfkiaaKcnks6s7rBPKo_gM9HZqwqjQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> Hello,
>> fd.c[1] will remove files from pgsql_tmp on a restart but not a
>> crash-restart per this comment:
>
>> /*
>> * NOTE: we could, but don't, call this during a post-backend-crash restart
>> * cycle. The argument for not doing it is that someone might want to
>> examine
>> * the temp files for debugging purposes. This does however mean that
>> * OpenTemporaryFile had better allow for collision with an existing temp
>> * file name.
>> */
>
>> I understand that this is designed this way. I think it is a bad idea
>> because:
>
> Possible compromise: remove files only in non-Assert builds?
That sorta seems like tying two things together that aren't obviously
related. I think building with --enable-cassert is support to enable
debugging cross-checks, not change behavior.
I would vote for removing them always, and if somebody doesn't want to
remove them, well then they can comment the code out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 19:16:30 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Possible compromise: remove files only in non-Assert builds?
> That sorta seems like tying two things together that aren't obviously
> related. I think building with --enable-cassert is support to enable
> debugging cross-checks, not change behavior.
Well, it's support to enable debugging, and I would classify not
destroying evidence as being debugging support.
regards, tom lane
From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 19:19:39 |
Message-ID: | CAM3SWZREPVKc80Dog+R2sxW=wt+LfCjTyCBPStWo4yA3+zigiA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Mar 16, 2016 at 10:53 AM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> fd.c[1] will remove files from pgsql_tmp on a restart but not a
> crash-restart per this comment:
>
> /*
> * NOTE: we could, but don't, call this during a post-backend-crash restart
> * cycle. The argument for not doing it is that someone might want to
> examine
> * the temp files for debugging purposes. This does however mean that
> * OpenTemporaryFile had better allow for collision with an existing temp
> * file name.
> */
>
> I understand that this is designed this way. I think it is a bad idea
FWIW, I've seen this get out of hand several times myself.
--
Peter Geoghegan
From: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: fd.c doesn't remove files on a crash-restart |
Date: | 2016-03-16 22:16:14 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 3/16/16 2:16 PM, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Mar 16, 2016 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Possible compromise: remove files only in non-Assert builds?
>
>> That sorta seems like tying two things together that aren't obviously
>> related. I think building with --enable-cassert is support to enable
>> debugging cross-checks, not change behavior.
>
> Well, it's support to enable debugging, and I would classify not
> destroying evidence as being debugging support.
Another option: keep stuff around for a single restart. I don't think
this would be that hard by having a file that's a list of files to
remove on the next restart. On restart, remove everything in that file
(and the file itself). If there's anything left, create a new file
that's the list of what's left.
The other nice thing about having this list is it would tell the DBA
exactly what files were left after the crash vs what's new.
Actually, I guess another option would be to have a separate directory
to move all these files into. On restart, nuke the directory if it
exists, then move stuff in there if necessary.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com