first attempt at centralizing more global error logging state in ExceptionSink by cosmicexplorer · Pull Request #6552 · pantsbuild/pants

cosmicexplorer · 2018-09-24T22:29:13Z

Problem

See #6530. This is bullet point 2 of that issue.

Solution

rename ExceptionSink#set_destination() to reset_log_location().
(re)?acquire open file handles and persist them as ExceptionSink private fields so we can write to them in signal handlers.
- use faulthandler.enable() to log fatal signals (SIGABRT, SIGSEGV, ...) and stacktraces to the process-specific error stream from pantsd client logs exceptions from server processes #6539.
add reset_exiter() and delegate to the exiter to exit on an unhandled exception.
add reset_interactive_output_stream() to change which stream the Exiter is provided as the out argument, and to enable stacktraces to be sent to that stream when the process receives SIGUSR2 (on par with current functionality)
move everything besides exiting out of the Exiter class

Result

We can un-xfail the integration test introduced in #6539. This makes it significantly easier to diagnose pantsd communication errors and their causes, both locally and in CI.

stuhood

Thanks... I think the biggest blocker to getting that test passing will be the fact that the signal handler will not currently write to the exception log. Doing that will require a bit of refactoring.

stuhood · 2018-09-24T22:45:59Z

+  class ExceptionSinkError(Exception): pass
+
+  # TODO (in response to the above): we can avoid a single entry point and make everything work with
+  # great ease if we can ensure all set_* methods are idempotent!


I agree that making these methods idempotent would be good.

stuhood · 2018-09-24T22:47:16Z

  def set_destination(cls, dir_path):
-    cls._destination = dir_path
+    cls._destination = cls._check_or_create_new_destination(dir_path)
+    cls._all_destinations.append(cls._destination)


I don't think a history of destinations is necessarily useful, so would drop this property.

After thinking about the use case of RemotePantsRunner a bit, I think I agree, because if we're going to want to dive into any exception logging like we do there we'll probably know where to look instead of wanting to dredge up all our possible log files (?).

However, after thinking, I'm thinking now mostly about the situation which is completely unhandled now, which is that if something happens to call ExceptionSink.set_destination() after RemotePantsRunner does, we immediately lose any logs we might have wanted for the thin client's error message.

The _all_destinations bit doesn't really address that except by way of the commented-out incomplete method body which would just dredge up all the logs (not useful for the same/opposite reason), but that situation might be something we want to special-case.

I think you addressed exactly this in your comment below re: set_pid, actually. Figuring out the log file in a signal handler part now though.

stuhood · 2018-09-24T22:51:52Z

+    if faulthandler.is_enabled():
+      faulthandler.disable()
+    faulthandler.enable(trace_stream)
+    # This permits a non-fatal `kill -31 <pants pid>` for stacktrace retrieval.


This should likely be written to both the current trace_stream and the current exception log(s).

Doing the latter thing would require keeping an open file handle for the current exception log(s), which is a bit harder... but I think that getting the test I added to pass will require this, because killing the process with a signal will only result in this firing, rather than also logging an exception.

Additionally, (re)-opening the exception log for our current pid (on a best effort basis...) would likely require an explicit step, like set_pid: we'd call it as soon as we know we've forked.

The reason I didn't consider this such a huge change when we talked was I think because the biggest change I see is just "having an actual signal handler for SIGUSR2 instead of using faulthandler", and the code I threw away was already doing that in order to handle SIGINT and SIGQUIT. I'm not sure now whether the handling of those signals should be done in this PR. We currently set handlers for those signals in RemotePantsRunner to send a signal to the pantsd-runner process on control-c.

Well, the biggest change is figuring out where to write in a signal handler as described in your comment, but after that's figured out.

having an actual signal handler for SIGUSR2 instead of using faulthandler

If you haven't already, take a look at the restrictions on signal handlers: https://docs.python.org/3/library/faulthandler.html ...

The fault handler is called on catastrophic cases and therefore can only use signal-safe functions (e.g. it cannot allocate memory on the heap). Because of this limitation traceback dumping is minimal compared to normal Python tracebacks:

That's extremely helpful! As an aside, I have used the glibc docs on signal handlers before and found the bit on re-entrancy and signal handlers to be super illuminating to understand what's allowed in a signal handler and why and have found it applicable to other languages. I will peruse the python docs you linked because I was looking for that exactly and literally have a tab open to that page but just was only looking for method names and didn't read it. Thanks!

stuhood · 2018-09-24T22:55:49Z

+      safe_mkdir(destination)
+    except Exception as e:
+      # NB: When this class sets up excepthooks, raising this should be safe, because we always
+      # have a destination to log to (os.getcwd() if not otherwise set).


Now that this field defaults to None this comment isn't accurate.

But regardless: I think that if this is refactored to send the signal-handler output to the exception log, it won't matter, because we'll need to eagerly open the exception/signal log in order to register it with faulthandler.

Now that this field defaults to None this comment isn't accurate.

Will update.

But regardless: I think that if this is refactored to send the signal-handler output to the exception log, it won't matter, because we'll need to eagerly open the exception/signal log in order to register it with faulthandler.

I'm understanding this as: we need to handle "opening the exception log in a signal/exception handler" anyway so in doing so we don't have to worry about the validity of the exception log in other methods (because it'd be handled specifically somewhere). Which is cool.

"opening the exception log in a signal/exception handler" anyway

See the faulthandler docs linked above.

stuhood · 2018-09-24T23:04:24Z

+    # TODO: this is messy. Is it less surprising to have the Exiter modified during construction, or
+    # in the run() method we do here?
+    ExceptionSink.set_destination(self._global_options.pants_workdir)
+    self._exiter.should_print_backtrace = self._global_options.print_exception_stacktrace


Question I do not know the answer to: should the should_print_backtrace property be on the ExceptionSink instead? It applies to the current errstream, rather than to the Exiter, per-se.

I moved this around many times, including letting the traceback formatter be None to denote not printing tracebacks. On master, should_print_backtrace applies to the exception message printed to stderr on exit only, not to the log message, so this code conforms to the current status, which also seems to make most sense to me (I was going to delete it, but obviously this is required for the specific case of exiting with error). We take an Exiter as a param in order to avoid subsuming exiting as well as fatal logging functionality. The current errstream is an implementation detail, and btw, also not used yet, I need to make ExceptionSink call its Exiter with the out arg provided.

I don't think should_print_backtrace should be a property of the errstream or the ExceptionSink as opposed to the Exiter unless there are more situations besides "we are printing an error message on an unexpected error, possibly to a terminal" that we would want should_print_backtrace to apply to (which I'm all ears for, but don't see right now).

Note to self to make ExceptionSink use the out arg when exiting.

stuhood · 2018-09-24T23:06:10Z

    try:
      PantsRunner(exiter, start_time=start_time).run()
    except KeyboardInterrupt:
+      # TODO: ExceptionSink could probably take care of this too, as well as converting SIGINT into


Not sure about that one. Where possible, I feel like treating the Exiter as a high level concept associated with a particular run, and ExceptionSink as the "always available manager of exception logs (and maybe an err stream if it's available)" is a cleanish separation.

With regard to handling keyboard interrupts: those are pretty clearly a per-run concept, and so if possible, their Handling should be part of the Exiter.

Hm, ok, defining it like that I think is better than what I had in my head, which was "ExceptionSink makes sure your logs are logged" (subsuming trapping as well as logging -- the first half of "ExceptionSink"). If we can move global state e.g. changing signal handlers to the Exiter, or to its file, I think that's a great move.

baroquebobcat · 2018-09-26T16:39:13Z

+      if log_location.pid is None:
+        raise cls.ExceptionSinkError(
+          "No-op call to reset_log_location(): argument was {!r}."
+          .format(log_location))


I found this message confusing. I don't think this is a noop. Maybe it should be "log_location must specify either a pid or a directory, but both were None"

Alternatively, you could change LogLocation#__new__ so that it throws when both are None. That would make it easier to trace where the LogLocation in question came from.

The difficulty here is that I'm overloading LogLocation (it used to be called LogLocationRequest) for both setting state and getting state, and those should have different semantics, like you describe. I think the clear answer is to split the two out (one for exceptions_log_path and one for reset_log_location).

baroquebobcat · 2018-09-26T16:42:09Z

+      # Ensure the directory is suitable for writing to, or raise.
+      cls._check_or_create_new_destination(log_location.log_dir)
+    if log_location.pid is None:
+      log_location = log_location.copy(pid=os.getpid())


You could also default the pid in this way in LogLocation#__new__

Will do as part of above.

stuhood · 2018-09-28T17:49:41Z

+      if exception_text:
+        yield exception_text
+
+  def broadcast_signal_to_client(self, signum):


IMO, we should refactor out all changes to Ctrl+C handling, because that sends signals from the nailgun client to the nailgun server. It should be easy to add later now that we have the pid and pgrp.

Reverted the changes to this file.

stuhood · 2018-09-28T17:56:06Z

    # process that the stacktrace output lands deterministically in a known place vs to a stray
    # terminal window.
-    self._exiter.set_except_hook(sys.stderr)
+    ExceptionSink.reset_log_location(LogLocation(os.getpid(), os.getcwd()))


We should keep the location we have before this call, because otherwise we will switch away from the configured location to the cwd here.

We do need to set the pid though, because we just finished forking, and definitely have a new pid.

I have removed the ability to do anything about the pid at all. This is still broken for a totally different reason but won't be in a bit.

stuhood · 2018-09-28T17:57:45Z

+    return super(GetLogLocationRequest, cls).__new__(cls, log_dir=log_dir, pid=pid)
+
+
+class LogLocation(datatype([


I think this datatype might be overkill for this case, especially since these properties can be change independently.

This datatype is gone.

ity

lets address these and will re-review

ity · 2018-09-28T18:09:52Z

+      raise KeyboardInterrupt('User interrupted execution with control-c!')
+    # TODO: determine the appropriate signal-safe behavior here (to avoid writing to our file
+    # descriptors re-entrantly, which raises an IOError). This is a best-effort solution.
+    if cls._is_handling_signal:


a docstring here to explain what signals this is actually handling, would be great.

ity · 2018-09-28T18:22:54Z

 logger = logging.getLogger(__name__)


+class RunTrackingExiter(Exiter):


lets extract RunTrackingExiter changes out of local_pants_runner since its unrelated to this change and to keep this PR limited to pants_daemon_runner changes.

The changes from this file have now also been reverted entirely!

ity · 2018-09-28T18:31:08Z

    options_bootstrapper = OptionsBootstrapper(env=self._env, args=self._args)
    bootstrap_options = options_bootstrapper.get_bootstrap_options()
-    ExceptionSink.set_destination(bootstrap_options.for_global_scope().pants_workdir)
+    ExceptionSink.reset_log_location(LogLocation.from_options_for_current_process(


as we discussed, separating out pid and log location between daemon_runner and pants_runner is

This has been changed to just update the location and the exiter from the bootstrap options.

ity · 2018-09-28T18:32:12Z

 logger = logging.getLogger(__name__)


+class RemoteExiter(Exiter):


this may not be required at all, since server (pantsd) never sends anything to the client

Indeed! This file has been reverted entirely!

ity · 2018-09-28T18:37:01Z

-        logger.fatal('lost active connection to pantsd!')
-        raise_with_traceback(self._extract_remote_exception(pantsd_handle.pid, e))
+        error_log_msg = '\nlost active connection to pantsd!'
+        ExceptionSink.log_exception(error_log_msg)


this wont be required at all, since you are already handling this earlier

This file has been reverted entirely!

ity · 2018-09-28T18:37:37Z


  def run(self, args=None):
-    self._setup_logging()
+    # Redirect fatal error logging to the current workdir, set the stream to log stacktraces to on


all of this can be reverted since we are removing RemoteExiteretc from this change.

This file has been reverted entirely!

ity · 2018-09-28T18:40:31Z

  def _emit(self, s):
    """Append content to the main report file."""
-    if os.path.exists(self._html_dir):  # Make sure we're not immediately after a clean-all.
+    report_file_probably_writable = (


we can remove this too, as we are removing any RunTracker changes from this.

ity · 2018-09-28T18:42:09Z

  def _trapped_signals(self, client):
    """A contextmanager that overrides the SIGINT (control-c) and SIGQUIT (control-\) handlers
    and handles them remotely."""
+    # TODO: if this control-c is done at the wrong time (before the pantsd-runner process begins and


lets extract all the ctrl-c changes from this PR into a separate one.

This file has been reverted entirely!

…ptionSink - make it handle SIGUSR2 and unhandled exceptions so we can move that logic out of Exiter

ity

nice work! this is looking great to me - will wait for your changes to +1.

ity · 2018-10-02T23:34:18Z

    print(bold(cyan('\ncompleted in {} seconds'.format(elapsed))))

+    # TODO: uncomment this and add an issue link!
+    run_count = self._run_count(workdir)


this was the culprit ;)

cosmicexplorer · 2018-10-02T23:52:55Z

Just tried cherry-picking, and remembered we'll probably need #6533 and #6539 first, trying that now (EDIT: that worked, so this should be fine to hotfix onto 1.10.x).

stuhood

Thanks, this looks good. It would be great to continue to (lightly) refine the relationship between the Exiter and the ExceptionSink to make their individual responsibilities clearer... but it looks logically correct from my end.

stuhood · 2018-10-04T21:12:27Z

        except KeyboardInterrupt:
-          po.send_signal(signal.SIGINT)
+          # The process may still have exited, even if we were interrupted.
+          safe_kill(po.pid, signal.SIGINT)


You can leave this in this PR, but a reminder: please do not conflate independent changes in one PR like this in the future. If this line causes a bug, or if the rest of the PR contains a bug, they won't be able to be cherry-picked independently. And it also makes reviewing more challenging.

This was necessary to pass testing. It may not be necessary now with the removal of the signal handlers.

Again, I'm fine with leaving it... just a note for future PRs.

If it's required to pass testing because of the no longer existing changes in signal handling, it would point to signal handling (correctly) being an instance of changing multiple things at once (along with the logging changes in this PR), which I agree is made very clear here. I don't understand why this specific change is something to avoid if we needed it to pass CI?

stuhood · 2018-10-04T21:14:25Z

+      # Create the directory if possible, or raise if not writable.
+      cls._check_or_create_new_destination(new_log_location)
+    else:
+      # Interpret a None argument to mean to just refresh the file handles for the current pid.


This method is never called with None, afaict, so would be good to remove this case to improve clarity.

Yes, after talking with you last week about where we need to reset the logs (whenever we get bootstrap options or whenever we change pid) and we can remove that now.

stuhood · 2018-10-04T21:17:52Z

+    This is where the the error message on exit will be printed to as well.
+    """
+    # NB: mutate process-global state!
+    # TODO: chain=True will log tracebacks to previous values of the trace stream as well -- what


The faulthandler docs indicate that you can call unregisteror disable to remove previous copies of handlers. Is that worth doing here?

At first I had it doing that, but ended up just copying the current code (hence the TODO, because writing to maybe-closed maybe-unrelated streams after explicitly changing it sounds like the wrong thing to do). I think it would make more sense to unregister and disable here.

stuhood · 2018-10-04T21:20:57Z

+                     out=cls._interactive_output_stream)
+
+  @classmethod
+  def _log_unhandled_exception_and_exit(cls, exc_class=None, exc=None, tb=None, add_newline=False):


This method is called from outside of the class, and should maybe be public.

It's also worth considering whether this should (mostly) stay in the Exiter, and just call back to the ExceptionSink for the purposes of writing to the exception logs...? I'm not very clear on that though, given the fact that they both call one another at various times.

I think I like that idea a lot. I think it specifically should be done, along with the comment on the Exiter holding onto its output stream, in a followup. I will continue addressing your comments and seeing what can be done here without lots more code changes.

Broke out #6596!

stuhood · 2018-10-04T21:21:48Z

+    faulthandler.register(signal.SIGUSR2, interactive_output_stream, chain=True)
+
+    # NB: mutate the class variables!
+    # We don't *necessarily* need to keep a reference to this, but we do here for clarity.


This field is used in _exit_with_failure, so this comment isn't accurate. As commented in _log_unhandled_exception_and_exit, maybe the Exiter should just hold onto the stream it wants to use to render a failure to the user, and the ExceptionSink should just handle the exception logs?

That sounds like a fantastic idea.

Broke out #6596!

stuhood · 2018-10-04T21:32:53Z

    self.purge_metadata()
    self.pre_fork(**pre_fork_opts or {})
    is_parent, is_child = fork_func()
+    # No arguments reuses the same log location, but with the new pid after forking.


I think this comment is in the wrong place?

stuhood · 2018-10-04T21:33:41Z

+      out = out or sys.stderr
+      if not is_fileobj_definitely_closed(out):
+        print(msg, file=out)
+        # NB: Ensure we write everything out in case it's not an unbuffered stream like stderr.


I think that f.close() implicitly flushes.

If we close stderr here we may see an error when invoking pants as a pex because the pex executor will call sys.stderr.flush() directly (haven't checked yet, but that was the error we saw in our internal CI). pex writing directly to stderr is something that I believe could be configurable and which would be extremely useful. maybe. i'll make a followup ticket.

As in, it calls that in the body of a finally: after the pex exits, and then raises an exception if stderr is closed.

The code is now all the same, but wrapped in a try/catch which calls logger.exception(e). If it is almost definitely better to close it here, I can do it -- just not sure if there's a reason to do it over the flush alone. I've left a detailed TODO above the flush() call about this.

stuhood · 2018-10-04T21:35:28Z

+
+
+def is_fileobj_definitely_closed(fileobj):
+  """Return whether fileobj points to a file object which is definitely closed.


This method is a bit too context-specific to put in util I think.

Agreed, it's not that pythonic either.

stuhood · 2018-10-04T21:39:53Z

-    self.nailgun_client.send_control_c()
-    mock_kill.assert_not_called()
-
-  @mock.patch('os.kill', **PATCH_OPTS)


This test seems useful... would be good to preserve if possible. If it was broken beforehand, leaving it in as xfail would be good, because there is a race between getting a pid from the server and Ctrl+C arriving at the client that means it's important to be able to noop.

That race is erased in #6579 with the pid/pgrp being required before the rest of the communication continues, btw. I have added these back, fixed the mock so they pass, and added an additional test for the new maybe_send_signal() method.

stuhood · 2018-10-04T21:41:43Z

+      self.assertRegexpMatches(waiter_run.stderr_data, """\
+Signal {signum} was raised\\. Exiting with failure\\.
+\\(backtrace omitted\\)
+""".format(pid=parent_runner_pid, signum=signal.SIGTERM))


The pid isn't used here, but that's probably fine: telling someone the dead pid is less necessary than rendering the error.

This is an artifact of a previous implementation which logged the pid with a timestamp to stderr, which is now only supposed to occur when writing to a log file, which is why this error stayed hidden. I think it can be removed.

The pid has been removed here, replaced with a note about why we only need to check the stderr.

stuhood

The last commit re-xfails the test that this was supposed to be fixing. We need that test to be passing here. Let's discuss when you're available.

This reverts commit f70e7c0.

stuhood

Thanks Danny.

cosmicexplorer · 2018-10-08T21:16:30Z

This is the only remaining failure -- this is what the safe_kill() call was for earlier. I have a list of quick followups (separate from #6596, which doesn't need to happen immediately) but given the size of this PR I'm just going to fix the failure, land this, and then make the followup.

this fixes the keyboardinterrupt integration test

### Problem This PR contains several deemed-necessary-by-me-personally fixes to fatal error logging which weren't put into #6552 because that diff was already massive. This should be considered the same PR as that one. ### Solution - Default the `logs/` directory to be `.pants.d/logs` so we don't litter the buildroot with a top-level `logs/` directory. - Add a timestamp to any fatal errors printed to the terminal to more easily debug race conditions. - Add the process title (from `setproctitle`) to the fatal error log message (this makes it more clear when the process is pantsd, compared to the `sys.argv`, which are arbitrary). - Remove any mention of exception stacktraces from Exiter. - Add `ExceptionSink.reset_should_print_backtrace()` (with a new mutable field `_should_print_backtrace`) to determine whether to print stacktraces to the terminal on exit. - Testing for all of the above.

…ptionSink (#6552) See #6530. This is bullet point 2 of that issue. - rename `ExceptionSink#set_destination()` to `reset_log_location()`. - (re)?acquire open file handles and persist them as `ExceptionSink` private fields so we can write to them in signal handlers. - use `faulthandler.enable()` to log fatal signals (`SIGABRT`, `SIGSEGV`, ...) and stacktraces to the process-specific error stream from #6539. - add `reset_exiter()` and delegate to the exiter to exit on an unhandled exception. - add `reset_interactive_output_stream()` to change which stream the Exiter is provided as the `out` argument, and to enable stacktraces to be sent to that stream when the process receives SIGUSR2 (on par with current functionality) - move everything besides exiting out of the Exiter class We can un-xfail [the integration test](https://github.com/pantsbuild/pants/blob/f70e7c0b7d55a389d31b9b3b706d0901bc9f655e/tests/python/pants_test/pantsd/test_pantsd_integration.py#L416) introduced in #6539. This makes it significantly easier to diagnose pantsd communication errors and their causes, both locally and in CI.

### Problem This PR contains several deemed-necessary-by-me-personally fixes to fatal error logging which weren't put into #6552 because that diff was already massive. This should be considered the same PR as that one. ### Solution - Default the `logs/` directory to be `.pants.d/logs` so we don't litter the buildroot with a top-level `logs/` directory. - Add a timestamp to any fatal errors printed to the terminal to more easily debug race conditions. - Add the process title (from `setproctitle`) to the fatal error log message (this makes it more clear when the process is pantsd, compared to the `sys.argv`, which are arbitrary). - Remove any mention of exception stacktraces from Exiter. - Add `ExceptionSink.reset_should_print_backtrace()` (with a new mutable field `_should_print_backtrace`) to determine whether to print stacktraces to the terminal on exit. - Testing for all of the above.

### Problem #6552 has some changes around signal handling. This was an attempt to move those out, and to gain greater visibility into and control over when pants errors out due to a signal (helping to address difficulties in flaky tests around exiting, such as #6708). Resolves #7014, resolves #6708, resolves #6847, resolves #7199. ### Solution - Create a very simple `SignalHandler` class which can be subclassed to set pants's behavior upon receiving different signals in different execution contexts. - Add a `_signal_handler` field to `ExceptionSink`. - Apply `SignalHandler` everywhere that tries to modify pants's exception handling. - Add the `--pantsd-pailgun-quit-timeout` bootstrap option and implement a timeout strategy to continue to forward output from the remote `pantsd-runner` process for a brief period of time upon receiving a signal, then killing the remote process if it's still alive at the end. - Unskip some flaky tests that we hopefully won't have to reskip. ### Result It's much more clear when and where signal handlers are set, and the flow of control is made more clear. It is easier to extend signal handlers in a hygienic way, allowing for the resolution of multiple flaky tests.

cosmicexplorer requested a review from stuhood September 24, 2018 22:29

stuhood reviewed Sep 24, 2018

View reviewed changes

cosmicexplorer force-pushed the catch-everything branch from 88df4d1 to fd183ec Compare September 26, 2018 02:49

baroquebobcat reviewed Sep 26, 2018

View reviewed changes

cosmicexplorer force-pushed the catch-everything branch 5 times, most recently from 20effeb to a2cf6d7 Compare September 28, 2018 16:27

stuhood reviewed Sep 28, 2018

View reviewed changes

ity requested changes Sep 28, 2018

View reviewed changes

cosmicexplorer force-pushed the catch-everything branch 3 times, most recently from eddcea0 to e2ac5a2 Compare September 30, 2018 02:03

cosmicexplorer mentioned this pull request Sep 30, 2018

Handle signals gracefully #6574

Merged

cosmicexplorer added 15 commits October 2, 2018 15:03

first attempt at centralizing more global error logging state in Exce…

0d07c66

…ptionSink - make it handle SIGUSR2 and unhandled exceptions so we can move that logic out of Exiter

remove _all_destinations

3eeac82

make exiter print to the trace stream

8ad5689

fix a TODO

c809c9d

refactor ExceptionSink to hold persistent handles to error log streams

353a977

add test for try_find_exception_logs_for_pids()

3cd1b0a

add exception sink integration test stubs

4233faa

add LogLocation type to simplify the interface

bc291d3

keep a log location variable, not two separate fields

7fe1787

handle SIGTERM and SIGQUIT, maybe

ce0a71f

split LogLocation out into GetLogLocationRequest

fb68b02

catch just SIGTERM

08d6e94

make tests pass on osx

b1a95c6

memoize RunTracker#end

c2708e5

move cleanup logic to exiter subclasses to catch signals

08ccce9

ity reviewed Oct 2, 2018

View reviewed changes

revert run tracking calculation

adee768

ity reviewed Oct 2, 2018

View reviewed changes

fix expected run tracking calculation for kill-pantsd invocations

a8ed3d9

revert signal handling changes

f70e7c0

cosmicexplorer force-pushed the catch-everything branch from 898e34c to f70e7c0 Compare October 4, 2018 20:20

stuhood approved these changes Oct 4, 2018

View reviewed changes

cosmicexplorer changed the title ~~[WIP] first attempt at centralizing more global error logging state in ExceptionSink~~ first attempt at centralizing more global error logging state in ExceptionSink Oct 4, 2018

cosmicexplorer added 3 commits October 4, 2018 19:32

address review comments

9111c5f

add back expects_pid arh to NailgunClient

d3934fa

fix outdated method name and add test for added arg to signal send

3252202

This was referenced Oct 5, 2018

make Exiter do the exiting and ExceptionSink do the sinking #6596

Closed

nondeterministic nailgun errors tracking issue #6530

Closed

cosmicexplorer added 2 commits October 4, 2018 23:03

fix a comment about what faulthandler does

7abcf10

add an xfail

865efe3

ity approved these changes Oct 5, 2018

View reviewed changes

stuhood requested changes Oct 5, 2018

View reviewed changes

cosmicexplorer added 2 commits October 5, 2018 09:28

Revert "revert signal handling changes"

700b2ea

This reverts commit f70e7c0.

make pantsd exit with os._exit

17f6787

stuhood approved these changes Oct 8, 2018

View reviewed changes

use safe_kill() in case the process has already stopped

0dcf448

this fixes the keyboardinterrupt integration test

cosmicexplorer merged commit 295ea60 into pantsbuild:master Oct 9, 2018

cosmicexplorer mentioned this pull request Oct 9, 2018

Fatal error logging followup fixes #6610

Merged

cosmicexplorer mentioned this pull request Oct 12, 2018

NodeTestIntegrationTest.test_test_simple flaky in CI #6621

Closed

		return super(GetLogLocationRequest, cls).__new__(cls, log_dir=log_dir, pid=pid)


		class LogLocation(datatype([

		logger = logging.getLogger(__name__)


		class RunTrackingExiter(Exiter):

		logger = logging.getLogger(__name__)


		class RemoteExiter(Exiter):



		def is_fileobj_definitely_closed(fileobj):
		"""Return whether fileobj points to a file object which is definitely closed.

Uh oh!

Conversation

cosmicexplorer commented Sep 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Result

Uh oh!

stuhood left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cosmicexplorer commented Sep 24, 2018 •

edited

Loading

cosmicexplorer Sep 25, 2018 •

edited

Loading

cosmicexplorer Sep 25, 2018 •

edited

Loading

cosmicexplorer Sep 25, 2018 •

edited

Loading

cosmicexplorer Sep 25, 2018 •

edited

Loading

cosmicexplorer Sep 29, 2018 •

edited

Loading