create singleton ExceptionSink object to centralize logging of fatal errors#6533
Conversation
stuhood
left a comment
There was a problem hiding this comment.
Thanks! I think moving Exiter._log_exception to this class blocks landing this, but it looks fine otherwise.
| return cls() | ||
|
|
||
| def __init__(self): | ||
| self._cached_destination = None |
There was a problem hiding this comment.
This should probably default to cwd so that it is always useable, even before the destination has been set the first time.
Also I'm not sure it's really a "cached" destination... maybe just "destination".
There was a problem hiding this comment.
That sounds great, implementing this and the below now.
|
|
||
| class ExceptionSinkError(Exception): pass | ||
|
|
||
| def _get_timestamp(self): |
There was a problem hiding this comment.
Rather than adding timestamp logic, I'd recommend moving the Exiter._log_exception method into this class.
And I think that if you guarantee that self._cached_destination is always "something" by initializing it to cwd, then you should be able to have set_destination log to the previous destination if it fails.
There was a problem hiding this comment.
I have done that and added thorough testing in c562a49!
|
Moving def _log_exception(self, msg):
if self._workdir:
ExceptionSink.instance.set_destination(self._workdir)
ExceptionSink.instance.log_exception(msg)It's great. |
stuhood
left a comment
There was a problem hiding this comment.
Looks good, thanks!
Please: add the per-pid logs before landing.
| class ExceptionSink(object): | ||
| """A mutable singleton object representing where exceptions should be logged to.""" | ||
|
|
||
| @memoized_classproperty |
There was a problem hiding this comment.
A pattern we've used elsewhere to do the "mutable singleton" thing would be to ban calls to __new__, and then put all properties directly on the class.
| return cls() | ||
|
|
||
| def __init__(self, destination=None): | ||
| # TODO: what assumptions can we make about the current directory of a process? Does it have to |
There was a problem hiding this comment.
The point of initializing it to something is just to increase the chances that even before it's been configured that we're still able to capture exceptions somewhere... so we're not really make assumptions, other than trying to increase our chances. The goal isn't every to actually use it without configuring it... rather to just still catch errors if we happen to fail that early.
There was a problem hiding this comment.
Yes, my concern here was more "can we assume we can write to os.getcwd()?"
There was a problem hiding this comment.
We can't, but that's not cromulent given the above.
| def _iso_timestamp_for_now(self): | ||
| return datetime.datetime.now().isoformat() | ||
|
|
||
| # NB: This includes a trailing newline, but no leading newline. |
There was a problem hiding this comment.
Making the logs structured is fine. But we are still going to want to have a separate log per $pid. The client shouldn't have to parse anything... just look for a log with the relevant pid.
…-specific error log file
| @classmethod | ||
| def set_destination(cls, dir_path): | ||
| if not is_writable_dir(dir_path): | ||
| # TODO: when this class sets up excepthooks, raising this should be safe, because we always |
There was a problem hiding this comment.
I can see you've removed it in #6539, sounds great.
There was a problem hiding this comment.
If you need any of this in your next PR, let me know and I can backport it. Or feel free to just cherry-pick to your branch.
There was a problem hiding this comment.
Alternatively, I could land #6539 with the test marked xfail.
There was a problem hiding this comment.
I'll cherry-pick whatever I need, I was thinking -- the changes are mostly orthogonal to what I have now. Landing with it marked xfail is a-ok to me as long as the changes don't break other tests (wouldn't think so).
| class ExceptionSink(object): | ||
| """A mutable singleton object representing where exceptions should be logged to.""" | ||
|
|
||
| _destination = os.getcwd() |
There was a problem hiding this comment.
Should probably explain why we do this ("blah blah increase the chances that if an exception occurs early in initialization we still record it somewhere").
There was a problem hiding this comment.
Absolutely agree, this is highly mysterious as is.
| # avoids any errors with concurrent modification of the shared log affecting the per-pid log. | ||
| with safe_open(cls._exceptions_log_path(for_pid=pid), 'a') as pid_error_log: | ||
| pid_error_log.write(fatal_error_log_entry) | ||
| # TODO: we should probably guard this against concurrent modification somehow. |
There was a problem hiding this comment.
I think that because this is a single write call it's fairly likely it goes to the log atomically. Not sure though.
There was a problem hiding this comment.
The implication from a few minutes of searching is that people have lots of ways to create a file atomically, but appending atomically is a less common use case. I might check the CPython source for fun later. My first question would be: if I open a file to append with mode 'a', does python seek to the end of the file in the open call, or in the write() call? I would assume the latter and since this is a small message I could see atomicity. I will flesh out this TODO somewhat in the followup PR.
### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.
…errors (#6533) ### Problem See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way. ### Solution - First step in implementing #6530, a (process-)global singleton to manage where logs are kept (no logging is done yet). - Add some very basic validation of input and very basic tests. ### Result We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.
### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.
…errors (#6533) ### Problem See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way. ### Solution - First step in implementing #6530, a (process-)global singleton to manage where logs are kept (no logging is done yet). - Add some very basic validation of input and very basic tests. ### Result We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.
### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.
Problem
See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way.
Solution
Result
We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.