Bug 727439 – branch review: danw/dispatch (dispatcher bug roll-up)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 727439 - branch review: danw/dispatch (dispatcher bug roll-up)


Summary:	branch review: danw/dispatch (dispatcher bug roll-up)


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	0.9.8
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-04-01 14:30 UTC by Dan Winship
Modified:	2014-04-18 15:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Dan Winship 2014-04-01 14:30:29 UTC

Rather than post this to any of the individual bugs, I'm opening a new bug. danw/dispatch contains:

9a2ba99 dispatcher: bump script timeout up to 20 seconds
https://bugzilla.redhat.com/show_bug.cgi?id=1048345

05e9d76 dispatcher: only dispatch one request at a time
https://bugzilla.redhat.com/show_bug.cgi?id=1061212

522b54b dispatcher: tell systemd to not kill dispatcher children
https://bugzilla.gnome.org/show_bug.cgi?id=725492

e6e4742 dispatcher: Leave PATH set in the dispatcher script environment
https://bugzilla.gnome.org/show_bug.cgi?id=724657

d03ec5b dispatcher: add better debugging
(added to make it easier to test)

48b7e33 man: updates to the dispatcher section


bug 721971 suggests making it so that if an interface goes down before all of its "up" scripts finish running, then the pending "up" scripts should be cancelled. This is perhaps even more important after the "only dispatch one request at a time" bug? The patch there no longer applies after this patch series, but it would be simpler to implement now if we want to.

Comment 1 Thomas Haller 2014-04-01 17:55:51 UTC

> d03ec5b dispatcher: add better debugging

diff --git i/callouts/nm-dispatcher-action.c w/callouts/nm-dispatcher-action.c
index a16be9e..23ae110 100644
--- i/callouts/nm-dispatcher-action.c
+++ w/callouts/nm-dispatcher-action.c
@@ -506,7 +506,7 @@ impl_dispatch (Handler *h,
                                                         vpn_ip6_props,
                                                         &iface);
 
-    if (debug) {
+    if (request->debug) {
          g_message ("------------ Action ID %p '%s' Interface %s Environment ------------",
                     context, str_action, iface ? iface : "(none)");
          for (p = request->envp; *p; p++)


> man: updates to the dispatcher section

Could you reference https://bugzilla.gnome.org/show_bug.cgi?id=721971 in the commit message as:


The following bug is similar but proposes changing this behavior, not documenting it.
Related: https://bugzilla.gnome.org/show_bug.cgi?id=721971




Rest looks good to me.

Comment 2 Dan Winship 2014-04-07 15:54:16 UTC

(In reply to comment #1)
> -    if (debug) {
> +    if (request->debug) {
>           g_message ("------------ Action ID %p '%s' Interface %s Environment
> ------------",

I'd skipped that intentionally actually... it seemed like it was logging more information than we'd normally log even at LOGL_DEBUG. But I guess that's a dumb reason. If it's useful for debugging, then we should log it...

> Could you reference https://bugzilla.gnome.org/show_bug.cgi?id=721971 in the
> commit message as:

I'm thinking at this point that since we are no longer in a hurry to get this committed immediately, that maybe I should just (re-)implement the behavior change from that bug, for the reasons mentioned at the end of comment 0.

Comment 3 Dan Winship 2014-04-07 16:15:33 UTC

(In reply to comment #2)
> I'm thinking at this point that since we are no longer in a hurry to get this
> committed immediately, that maybe I should just (re-)implement the behavior
> change from that bug, for the reasons mentioned at the end of comment 0.

So actually, we need to decide which invariant we want:

  1. all scripts are always dispatched, even if the device is no longer
     in the right state for the script when it is dispatched (current
     behavior)

  2. scripts are only dispatched if the associated state is still correct
     at the time the script would be started (and it's possible your
     DOWN script will be dispatched even if your UP script wasn't)
     (bug 721971)

  3. scripts are only dispatched if the associated state is still correct
     at the time the script would be started, but if your UP script gets
     skipped then your DOWN script will be skipped too.

3 seems more consistent than 2, but perhaps there are situations where you want to guarantee that you always get *some* notification when the network state changes, even if you missed it?

Comment 4 Thomas Haller 2014-04-10 15:04:04 UTC

I strongly vote for 1.

Otherwise it's hard to do anything reliable in the scripts because some might get skipped.


It is quite expected that the state can change anytime, so when you are in an UP script, the state could already have changed to something else. This is not avoidable. Even if we check, that the state is still current before calling g_spawn, it could change the very moment later.

A script, if it really cares about the current state, has to check it anyway anew. And with 1., it can see that the state changed and behave as with 2. or 3.

But with 2. or 3. it cannot behave like 1.


Also, 1. is slightly simpler for dispatcher.

Comment 5 Dan Williams 2014-04-17 16:45:35 UTC

I think #1 is my choice too, since it seems the most consistent.  Bad scripts will be bad no matter what we do, so if some script mishandles states then we should make sure we point the blame finger at the script and get that fixed in the right place.

We could also have the dispatcher listen for new state signals on the device it's dispatching for, and put that new state into an environment variable so that scripts would at least be aware that the state had changed.  Possibly a future enhancement, but talking of state changes also assumes that we don't block on script execution, right?

My thought for a long time was that since some scripts do take a while to execute (NFS shutdown comes to mind) that we should just make NM block certain state changes like SECONDARIES->ACTIVATED and DEACTIVATING->DISCONNECTED until all the scripts were run, and if that made things really slow, well, again we point the blame finger where it's due and make script authors fix their stuff.  A large timeout is effectively the same thing as blocking on scripts until they're done too.  Thoughts?

Comment 6 Thomas Haller 2014-04-17 19:55:22 UTC

(In reply to comment #5)
> I think #1 is my choice too, since it seems the most consistent.  Bad scripts
> will be bad no matter what we do, so if some script mishandles states then we
> should make sure we point the blame finger at the script and get that fixed in
> the right place.
> 
> We could also have the dispatcher listen for new state signals on the device
> it's dispatching for, and put that new state into an environment variable so
> that scripts would at least be aware that the state had changed.  Possibly a
> future enhancement, but talking of state changes also assumes that we don't
> block on script execution, right?

I would not do this. Script who care can also not trust this variable to be up to date, and the others don't care anyway. You simply cannot assume that NM is still in the same state an instance after you looked at the state.



> My thought for a long time was that since some scripts do take a while to
> execute (NFS shutdown comes to mind) that we should just make NM block certain
> state changes like SECONDARIES->ACTIVATED and DEACTIVATING->DISCONNECTED until
> all the scripts were run, and if that made things really slow, well, again we
> point the blame finger where it's due and make script authors fix their stuff. 
> A large timeout is effectively the same thing as blocking on scripts until
> they're done too.  Thoughts?

That sounds the right thing to do. For some state transitions it might not be possible to wait (especially if something else happens in the meantime). But *trying* to do so in several cases seems right.

Comment 7 Dan Winship 2014-04-18 15:47:06 UTC

OK, I pushed the branch keeping the existing behavior (#1); further discussion can happen on bug 721971.