create singleton ExceptionSink object to centralize logging of fatal errors by cosmicexplorer · Pull Request #6533 · pantsbuild/pants

cosmicexplorer · 2018-09-19T22:54:51Z

Problem

See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way.

Solution

First step in implementing nondeterministic nailgun errors tracking issue #6530, a (process-)global singleton to manage where logs are kept (no logging is done yet).
Add some very basic validation of input and very basic tests.

Result

We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.

…gleton

stuhood

Thanks! I think moving Exiter._log_exception to this class blocks landing this, but it looks fine otherwise.

stuhood · 2018-09-19T23:07:57Z

+    return cls()
+
+  def __init__(self):
+    self._cached_destination = None


This should probably default to cwd so that it is always useable, even before the destination has been set the first time.

Also I'm not sure it's really a "cached" destination... maybe just "destination".

That sounds great, implementing this and the below now.

stuhood · 2018-09-19T23:14:35Z

+
+  class ExceptionSinkError(Exception): pass
+
+  def _get_timestamp(self):


Rather than adding timestamp logic, I'd recommend moving the Exiter._log_exception method into this class.

And I think that if you guarantee that self._cached_destination is always "something" by initializing it to cwd, then you should be able to have set_destination log to the previous destination if it fails.

I have done that and added thorough testing in c562a49!

cosmicexplorer · 2018-09-20T04:22:18Z

Moving Exiter._log_exception() to ExceptionSink was larger in scope (conceptually, in my head) than I expected this PR to be, but it works perfectly -- the implementation is now just:

  def _log_exception(self, msg):
    if self._workdir:
      ExceptionSink.instance.set_destination(self._workdir)
    ExceptionSink.instance.log_exception(msg)

It's great.

stuhood

Looks good, thanks!

Please: add the per-pid logs before landing.

stuhood · 2018-09-20T17:10:59Z

+class ExceptionSink(object):
+  """A mutable singleton object representing where exceptions should be logged to."""
+
+  @memoized_classproperty


A pattern we've used elsewhere to do the "mutable singleton" thing would be to ban calls to __new__, and then put all properties directly on the class.

stuhood · 2018-09-20T17:12:59Z

+    return cls()
+
+  def __init__(self, destination=None):
+    # TODO: what assumptions can we make about the current directory of a process? Does it have to


The point of initializing it to something is just to increase the chances that even before it's been configured that we're still able to capture exceptions somewhere... so we're not really make assumptions, other than trying to increase our chances. The goal isn't every to actually use it without configuring it... rather to just still catch errors if we happen to fail that early.

Yes, my concern here was more "can we assume we can write to os.getcwd()?"

We can't, but that's not cromulent given the above.

stuhood · 2018-09-20T17:14:43Z

+  def _iso_timestamp_for_now(self):
+    return datetime.datetime.now().isoformat()
+
+  # NB: This includes a trailing newline, but no leading newline.


Making the logs structured is fine. But we are still going to want to have a separate log per $pid. The client shouldn't have to parse anything... just look for a log with the relevant pid.

…-specific error log file

stuhood

Thanks!

stuhood · 2018-09-20T18:54:55Z

+  @classmethod
+  def set_destination(cls, dir_path):
+    if not is_writable_dir(dir_path):
+      # TODO: when this class sets up excepthooks, raising this should be safe, because we always


This could just be a note.

I can see you've removed it in #6539, sounds great.

If you need any of this in your next PR, let me know and I can backport it. Or feel free to just cherry-pick to your branch.

Alternatively, I could land #6539 with the test marked xfail.

I'll cherry-pick whatever I need, I was thinking -- the changes are mostly orthogonal to what I have now. Landing with it marked xfail is a-ok to me as long as the changes don't break other tests (wouldn't think so).

stuhood · 2018-09-20T18:55:47Z

+class ExceptionSink(object):
+  """A mutable singleton object representing where exceptions should be logged to."""
+
+  _destination = os.getcwd()


Should probably explain why we do this ("blah blah increase the chances that if an exception occurs early in initialization we still record it somewhere").

Absolutely agree, this is highly mysterious as is.

stuhood · 2018-09-20T18:57:14Z

+      # avoids any errors with concurrent modification of the shared log affecting the per-pid log.
+      with safe_open(cls._exceptions_log_path(for_pid=pid), 'a') as pid_error_log:
+        pid_error_log.write(fatal_error_log_entry)
+      # TODO: we should probably guard this against concurrent modification somehow.


I think that because this is a single write call it's fairly likely it goes to the log atomically. Not sure though.

The implication from a few minutes of searching is that people have lots of ways to create a file atomically, but appending atomically is a less common use case. I might check the CPython source for fun later. My first question would be: if I open a file to append with mode 'a', does python seek to the end of the file in the open call, or in the write() call? I would assume the latter and since this is a small message I could see atomicity. I will flesh out this TODO somewhat in the followup PR.

### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.

…errors (#6533) ### Problem See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way. ### Solution - First step in implementing #6530, a (process-)global singleton to manage where logs are kept (no logging is done yet). - Add some very basic validation of input and very basic tests. ### Result We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.

### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.

…errors (#6533) ### Problem See #6530. We want to make it simpler to understand where logging of fatal errors occurs and to consume those logs in a structured way. ### Solution - First step in implementing #6530, a (process-)global singleton to manage where logs are kept (no logging is done yet). - Add some very basic validation of input and very basic tests. ### Result We now have an object that can be used to centralize the logging of fatal errors and enables the creation of a single point of failure (by which I mean, a single area of code which describes where this process's exceptions are logged and how to access and append to them), to be implemented in followup PRs.

### Problem #6530 describes cases where when `pantsd` (or a `pantsd-runner` instance) is killed, there can be gaps in our exception logging (and rendering) that result in exceptions not reaching either the client or a log. ### Solution Build atop #6533 to consume written logs and render them for a `pantsd` client. When either `pantsd` or a `pantsd-runner` instance die, if the `ExceptionSink` (or `Exiter`) has installed an exception hook that renders the exception to the appropriate `pid`, it will be logged, and rendered to the user. ### Result Killed `pantsd-runner` or `pantsd` processes have increased visibility and debuggability.

cosmicexplorer added 2 commits September 19, 2018 15:44

create basic ExceptionSink object for an exception logging entrypoint

a15c240

add instance memoized_classproperty to ExceptionSink to make it a sin…

ec29017

…gleton

cosmicexplorer requested a review from stuhood September 19, 2018 22:54

stuhood reviewed Sep 19, 2018

View reviewed changes

cosmicexplorer mentioned this pull request Sep 19, 2018

nondeterministic nailgun errors tracking issue #6530

Closed

move ExceptionSink to base and use it to log exceptions in Exiter

c562a49

stuhood approved these changes Sep 20, 2018

View reviewed changes

use class attributes instead of memoizing and additional log to a pid…

9ecf77a

…-specific error log file

stuhood approved these changes Sep 20, 2018

View reviewed changes

stuhood merged commit 17ddd39 into pantsbuild:master Sep 20, 2018

stuhood mentioned this pull request Sep 21, 2018

pantsd client logs exceptions from server processes #6539

Merged

cosmicexplorer mentioned this pull request Oct 2, 2018

first attempt at centralizing more global error logging state in ExceptionSink #6552

Merged


		class ExceptionSinkError(Exception): pass

		def _get_timestamp(self):

Uh oh!

Conversation

cosmicexplorer commented Sep 19, 2018

Problem

Solution

Result

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cosmicexplorer Sep 20, 2018 •

edited

Loading

cosmicexplorer commented Sep 20, 2018 •

edited

Loading

cosmicexplorer Sep 21, 2018 •

edited

Loading