GNOME Bugzilla – Bug 727439
branch review: danw/dispatch (dispatcher bug roll-up)
Last modified: 2014-04-18 15:47:06 UTC
Rather than post this to any of the individual bugs, I'm opening a new bug. danw/dispatch contains: 9a2ba99 dispatcher: bump script timeout up to 20 seconds https://bugzilla.redhat.com/show_bug.cgi?id=1048345 05e9d76 dispatcher: only dispatch one request at a time https://bugzilla.redhat.com/show_bug.cgi?id=1061212 522b54b dispatcher: tell systemd to not kill dispatcher children https://bugzilla.gnome.org/show_bug.cgi?id=725492 e6e4742 dispatcher: Leave PATH set in the dispatcher script environment https://bugzilla.gnome.org/show_bug.cgi?id=724657 d03ec5b dispatcher: add better debugging (added to make it easier to test) 48b7e33 man: updates to the dispatcher section bug 721971 suggests making it so that if an interface goes down before all of its "up" scripts finish running, then the pending "up" scripts should be cancelled. This is perhaps even more important after the "only dispatch one request at a time" bug? The patch there no longer applies after this patch series, but it would be simpler to implement now if we want to.
> d03ec5b dispatcher: add better debugging diff --git i/callouts/nm-dispatcher-action.c w/callouts/nm-dispatcher-action.c index a16be9e..23ae110 100644 --- i/callouts/nm-dispatcher-action.c +++ w/callouts/nm-dispatcher-action.c @@ -506,7 +506,7 @@ impl_dispatch (Handler *h, vpn_ip6_props, &iface); - if (debug) { + if (request->debug) { g_message ("------------ Action ID %p '%s' Interface %s Environment ------------", context, str_action, iface ? iface : "(none)"); for (p = request->envp; *p; p++) > man: updates to the dispatcher section Could you reference https://bugzilla.gnome.org/show_bug.cgi?id=721971 in the commit message as: The following bug is similar but proposes changing this behavior, not documenting it. Related: https://bugzilla.gnome.org/show_bug.cgi?id=721971 Rest looks good to me.
(In reply to comment #1) > - if (debug) { > + if (request->debug) { > g_message ("------------ Action ID %p '%s' Interface %s Environment > ------------", I'd skipped that intentionally actually... it seemed like it was logging more information than we'd normally log even at LOGL_DEBUG. But I guess that's a dumb reason. If it's useful for debugging, then we should log it... > Could you reference https://bugzilla.gnome.org/show_bug.cgi?id=721971 in the > commit message as: I'm thinking at this point that since we are no longer in a hurry to get this committed immediately, that maybe I should just (re-)implement the behavior change from that bug, for the reasons mentioned at the end of comment 0.
(In reply to comment #2) > I'm thinking at this point that since we are no longer in a hurry to get this > committed immediately, that maybe I should just (re-)implement the behavior > change from that bug, for the reasons mentioned at the end of comment 0. So actually, we need to decide which invariant we want: 1. all scripts are always dispatched, even if the device is no longer in the right state for the script when it is dispatched (current behavior) 2. scripts are only dispatched if the associated state is still correct at the time the script would be started (and it's possible your DOWN script will be dispatched even if your UP script wasn't) (bug 721971) 3. scripts are only dispatched if the associated state is still correct at the time the script would be started, but if your UP script gets skipped then your DOWN script will be skipped too. 3 seems more consistent than 2, but perhaps there are situations where you want to guarantee that you always get *some* notification when the network state changes, even if you missed it?
I strongly vote for 1. Otherwise it's hard to do anything reliable in the scripts because some might get skipped. It is quite expected that the state can change anytime, so when you are in an UP script, the state could already have changed to something else. This is not avoidable. Even if we check, that the state is still current before calling g_spawn, it could change the very moment later. A script, if it really cares about the current state, has to check it anyway anew. And with 1., it can see that the state changed and behave as with 2. or 3. But with 2. or 3. it cannot behave like 1. Also, 1. is slightly simpler for dispatcher.
I think #1 is my choice too, since it seems the most consistent. Bad scripts will be bad no matter what we do, so if some script mishandles states then we should make sure we point the blame finger at the script and get that fixed in the right place. We could also have the dispatcher listen for new state signals on the device it's dispatching for, and put that new state into an environment variable so that scripts would at least be aware that the state had changed. Possibly a future enhancement, but talking of state changes also assumes that we don't block on script execution, right? My thought for a long time was that since some scripts do take a while to execute (NFS shutdown comes to mind) that we should just make NM block certain state changes like SECONDARIES->ACTIVATED and DEACTIVATING->DISCONNECTED until all the scripts were run, and if that made things really slow, well, again we point the blame finger where it's due and make script authors fix their stuff. A large timeout is effectively the same thing as blocking on scripts until they're done too. Thoughts?
(In reply to comment #5) > I think #1 is my choice too, since it seems the most consistent. Bad scripts > will be bad no matter what we do, so if some script mishandles states then we > should make sure we point the blame finger at the script and get that fixed in > the right place. > > We could also have the dispatcher listen for new state signals on the device > it's dispatching for, and put that new state into an environment variable so > that scripts would at least be aware that the state had changed. Possibly a > future enhancement, but talking of state changes also assumes that we don't > block on script execution, right? I would not do this. Script who care can also not trust this variable to be up to date, and the others don't care anyway. You simply cannot assume that NM is still in the same state an instance after you looked at the state. > My thought for a long time was that since some scripts do take a while to > execute (NFS shutdown comes to mind) that we should just make NM block certain > state changes like SECONDARIES->ACTIVATED and DEACTIVATING->DISCONNECTED until > all the scripts were run, and if that made things really slow, well, again we > point the blame finger where it's due and make script authors fix their stuff. > A large timeout is effectively the same thing as blocking on scripts until > they're done too. Thoughts? That sounds the right thing to do. For some state transitions it might not be possible to wait (especially if something else happens in the meantime). But *trying* to do so in several cases seems right.
OK, I pushed the branch keeping the existing behavior (#1); further discussion can happen on bug 721971.