Bug 746703 – split out sync dispatcher actions from async queue

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 746703 - split out sync dispatcher actions from async queue


Summary:	split out sync dispatcher actions from async queue


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:	nm-1-2

Reported:	2015-03-24 18:53 UTC by Dan Winship
Modified:	2016-01-20 14:58 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
[PATCH] nm-dispatcher: always run pre-up and pre-down scripts immediately (4.54 KB, patch) 2015-05-29 16:10 UTC, Beniamino Galvani	none	Details \| Review
Subject: [PATCH v2] nm-dispatcher: allow scripts to be marked as non-blocking (9.22 KB, patch) 2015-06-15 15:43 UTC, Beniamino Galvani	none	Details \| Review
[PATCH v3] nm-dispatcher: allow scripts to be marked as non-blocking (11.33 KB, patch) 2015-06-16 20:35 UTC, Beniamino Galvani	none	Details \| Review
[PATCH v4] nm-dispatcher: allow scripts to be marked as non-blocking (10.72 KB, patch) 2015-06-22 20:14 UTC, Beniamino Galvani	none	Details \| Review
contrib/rpm: mark "10-ifcfg-rh-routes.sh" script as no-wait (2.02 KB, patch) 2015-10-09 19:40 UTC, Thomas Haller	none	Details \| Review

Description Dan Winship 2015-03-24 18:53:16 UTC

731ca771 changed the dispatcher to globally serialize dispatcher actions, so that if two interfaces are brought up at the same time, the same script won't end up getting run for both of them simultaneously, since having to deal with that possibility would make some kinds of scripts much more complicated.

This doesn't interact well with the fact that pre-up scripts are run synchronously though; it means that if you have a (post-)up script that takes 1 second to run, then activating 10 connections is now guaranteed to take at least 10 seconds. (Well, OK, 9 seconds.)

The fix is probably to make pre-up and pre-down scripts always run immediately, independently of the queue used for other dispatcher actions.

Comment 1 Beniamino Galvani 2015-05-29 16:10:56 UTC

Created attachment 304273 [details] [review]
[PATCH] nm-dispatcher: always run pre-up and pre-down scripts immediately

Comment 2 Thomas Haller 2015-06-01 10:50:42 UTC

(In reply to Dan Winship from comment #0)
> 731ca771 changed the dispatcher to globally serialize dispatcher actions, so
> that if two interfaces are brought up at the same time, the same script
> won't end up getting run for both of them simultaneously, since having to
> deal with that possibility would make some kinds of scripts much more
> complicated.
> 
> This doesn't interact well with the fact that pre-up scripts are run
> synchronously though; it means that if you have a (post-)up script that
> takes 1 second to run, then activating 10 connections is now guaranteed to
> take at least 10 seconds. (Well, OK, 9 seconds.)
> 
> The fix is probably to make pre-up and pre-down scripts always run
> immediately, independently of the queue used for other dispatcher actions.

is this really a bug? Works as designed.

If we don't guarantee an order when running scripts, it becomes much harder to write certain dispatcher scripts. Introducing this ordering is IMO a major feature, we should stick with it.

As a user, it's easy to fork if you don't care about ordering (or handle it somehow yourself). But you cannot get ordering guarantees once dispatcher runs the scripts immediately.

Of course, you never have a strong guarantee that the state is as expected anyway. But best-effort is still better then no guarantee at all.


Maybe there is a need to improve the documentation about this issue and do better logging -- so that the user sees which script is to blame.

Comment 3 Thomas Haller 2015-06-01 10:54:21 UTC

Or how about having a configuration value like

[dispatcher]
block-timeout=100ms


If a script doesn't return within this timeout, dispatcher would start the next one. With a default value of "unlimited".

Comment 4 Beniamino Galvani 2015-06-03 14:12:44 UTC

(In reply to Thomas Haller from comment #2)
> > The fix is probably to make pre-up and pre-down scripts always run
> > immediately, independently of the queue used for other dispatcher actions.
>
> is this really a bug? Works as designed.
>
> If we don't guarantee an order when running scripts, it becomes much harder
> to write certain dispatcher scripts. Introducing this ordering is IMO a
> major feature, we should stick with it.
>
> As a user, it's easy to fork if you don't care about ordering (or handle it
> somehow yourself). But you cannot get ordering guarantees once dispatcher
> runs the scripts immediately.

The change tries to solve the particular situation in which a pre-up
script is scheduled when there are already other scripts in the
dispatch queue that take long to execute. Pre-up (and pre-down)
scripts are different from other scripts because they are synchronous
- NM waits their termination before changing the device status.

So the idea is to find a tradeoff and to guarantee serialization for
async scripts, while running sync scripts ASAP. This works well in the
above scenario and I think that when serialization is enforced for
every kind of script it's not simple to solve the same problem
changing only the scripts.

Your point is valid and for sure a global ordering is an advantage as
it makes scripts simpler, while the patch introduces some complexity
and also an inconsistency. So I'm not sure, any suggestions are
welcome.

>
> Of course, you never have a strong guarantee that the state is as expected
> anyway. But best-effort is still better then no guarantee at all.
>
>
> Maybe there is a need to improve the documentation about this issue and do
> better logging -- so that the user sees which script is to blame.

The start and end times of script executions are already logged and
this information should be enough to find the guilty script in case
there is a major delay.

Also, the documentation already explains that scripts are serialized
and suggests to fork if the script may take long to complete, but
yeah, this part could be expanded a bit.

Comment 5 Beniamino Galvani 2015-06-03 14:33:31 UTC

(In reply to Thomas Haller from comment #3)
> Or how about having a configuration value like
> 
> [dispatcher]
> block-timeout=100ms
> 
> 
> If a script doesn't return within this timeout, dispatcher would start the
> next one. With a default value of "unlimited".

Yes this could be a possible solution as the option would basically allow to turn on or off serialization.

Comment 6 Dan Winship 2015-06-08 12:56:48 UTC

(In reply to Thomas Haller from comment #3)
> Or how about having a configuration value like
> 
> [dispatcher]
> block-timeout=100ms
> 
> 
> If a script doesn't return within this timeout, dispatcher would start the
> next one. With a default value of "unlimited".

This combines the disadvantages of both approaches; activation gets blocked/serialized whenever you have a pre-up script, but all scripts need to be able to deal with being run in parallel as well.

The goal for this fix is:

  - We can have /etc/NetworkManager/dispatcher.d/pre-up.d/10-ifcfg-rh-routes.sh
    installed by default

  AND

  - A machine with dozens of ethernet interfaces can still bring them all up
    in parallel more-or-less instantly.

Comment 7 Thomas Haller 2015-06-08 13:04:49 UTC

(In reply to Dan Winship from comment #6)
> (In reply to Thomas Haller from comment #3)
> > Or how about having a configuration value like
> > 
> > [dispatcher]
> > block-timeout=100ms
> > 
> > 
> > If a script doesn't return within this timeout, dispatcher would start the
> > next one. With a default value of "unlimited".
> 
> This combines the disadvantages of both approaches; activation gets
> blocked/serialized whenever you have a pre-up script, but all scripts need
> to be able to deal with being run in parallel as well.
> 
> The goal for this fix is:
> 
>   - We can have
> /etc/NetworkManager/dispatcher.d/pre-up.d/10-ifcfg-rh-routes.sh
>     installed by default
> 
>   AND
> 
>   - A machine with dozens of ethernet interfaces can still bring them all up
>     in parallel more-or-less instantly.


the issue only happened during testing. In a proper setups, the user is advised not to have long-running dispatcher scripts.

Very few users have 
 - long-running dispatcher scripts
 - many interfaces



How about having a directory:

  /etc/NetworkManager/dispatcher.d/non-blocking/

and if a script is a symlink into that directory, then schedule it immediately. So we would install:

  /etc/NetworkManger/dispatcher.d/non-blocking/10-ifcfg-rh-routes.sh
  ln -s ../non-blocking/10-ifcfg-rh-routes.sh \
     /etc/NetworkManager/dispatcher.d/pre-up.d/

Comment 8 Dan Winship 2015-06-08 14:57:14 UTC

(In reply to Thomas Haller from comment #7)
> the issue only happened during testing.

Well, sure, because we fixed it before it made it into a stable release...

> In a proper setups, the user is
> advised not to have long-running dispatcher scripts.
> 
> Very few users have 
>  - long-running dispatcher scripts
>  - many interfaces

The problem is that if you have many interfaces, then even fairly-quick dispatcher scripts will slow things down tremendously. Last time I checked, on my machine it took about 2 seconds to run all of the up scripts for a device. So if you have 10 interfaces, then adding a pre-up script will add 20 seconds to startup. (And if you have 16 interfaces, then adding a pre-up script will cause NetworkManager-wait-online.service to time out and fail, causing other cascading failures...)

> How about having a directory:
> 
>   /etc/NetworkManager/dispatcher.d/non-blocking/
> 
> and if a script is a symlink into that directory, then schedule it
> immediately.

But under what circumstances would you *not* want a pre-up script to have that behavior?

Comment 9 Thomas Haller 2015-06-08 16:09:16 UTC

(In reply to Dan Winship from comment #8)
> (In reply to Thomas Haller from comment #7)

> But under what circumstances would you *not* want a pre-up script to have
> that behavior?

Hm, I don't know. I didn't write many dispatcher scripts :)

I think strict ordering is a feature, we should not drop entirely.
What was the reason to add the serialization of the scripts? Why would that reason not apply to pre-up scripts?


But is it also useful to have global strict ordering? How about we only serialize the inactivation of each script individually? Different scripts, can run in parallel.
For pre-up/down, we could support to follow symlinks+hardlinks, and if different scripts point to the same file, we would serialize them together.
Hm, sounds already complicated...

Comment 10 Dan Winship 2015-06-08 18:41:14 UTC

(In reply to Thomas Haller from comment #9)
> What was the reason to add the serialization of the scripts? Why would that
> reason not apply to pre-up scripts?

We have always run the scripts for a given interface in serial. (ie, when bringing up eth0, 20-bar.sh can't run until after 10-foo.sh finishes, but if we bring up eth1 as well, it doesn't care what's running on eth0). I don't know if there was any particular reason why.

In 1.0 we also added serialization between interfaces, because otherwise a script that always rewrites the same file (eg resolv.conf) could end up racing with another copy of itself. It's not *necessary* for us to do this, but it makes it simpler to write dispatcher scripts.

The same reasons that apply to other scripts apply to pre-up scripts, it's just that we decided that "you have to make sure your pre-up script won't conflict with another running copy of itself" is less bad than "installing a pre-up script may cause activation to become really really slow, depending on what other dispatcher scripts are installed, and there's no way you [the pre-up script author] can know if that's going to happen, and nothing you can do to prevent it".

> But is it also useful to have global strict ordering? How about we only
> serialize the inactivation of each script individually? Different scripts,
> can run in parallel.

That will still introduce a delay of at least the duration of the slowest up script. And again, there's nothing the pre-up script author can do to prevent that. Meaning we can't make the routing pre-up script be installed by default.

Comment 11 Thomas Haller 2015-06-08 20:11:59 UTC

(In reply to Dan Winship from comment #10)
> (In reply to Thomas Haller from comment #9)

> That will still introduce a delay of at least the duration of the slowest up
> script. And again, there's nothing the pre-up script author can do to
> prevent that. Meaning we can't make the routing pre-up script be installed
> by default.

Actually, all I ask is that a script can either opt-in to blocking, or opt-out of blocking (I don't mind which is the default, but I think blocking should be preferred as default).


The problem is only how a script can signalize this...

We could interpret a file
  /etc/NetworkManager/dispatcher.d/script.no-wait~
that comes together with
  /etc/NetworkManager/dispatcher.d/script

(the '~' is already now considered special, in that we would ignore such a file)

or how about the symlink suggestion?

Comment 12 Beniamino Galvani 2015-06-12 08:01:31 UTC

(In reply to Thomas Haller from comment #11)
> Actually, all I ask is that a script can either opt-in to blocking, or
> opt-out of blocking (I don't mind which is the default, but I think blocking
> should be preferred as default).

If we make scripts blocking by default and allow the user to override this on a per-script basis we have the advantages of: (a) not breaking any existing setup that may require serialization between pre-up scripts; (b) making it possible to install 10-ifcfg-rh-routes.sh by default, provided that it's marked as non-blocking; and (c) more flexibility even for other events as the serialization can be disabled for scripts that don't require it.

So this seems a good solution to me.

> The problem is only how a script can signalize this...
> 
> We could interpret a file
>   /etc/NetworkManager/dispatcher.d/script.no-wait~
> that comes together with
>   /etc/NetworkManager/dispatcher.d/script
> 
> (the '~' is already now considered special, in that we would ignore such a
> file)
> 
> or how about the symlink suggestion?

I prefer the symlink solution to the one requiring an additional file with .no-wait~ suffix.

Comment 13 Thomas Haller 2015-06-14 14:32:26 UTC

(In reply to Beniamino Galvani from comment #12)
> (In reply to Thomas Haller from comment #11)
> > Actually, all I ask is that a script can either opt-in to blocking, or
> > opt-out of blocking (I don't mind which is the default, but I think blocking
> > should be preferred as default).
> 
> If we make scripts blocking by default and allow the user to override this
> on a per-script basis we have the advantages of: (a) not breaking any
> existing setup that may require serialization between pre-up scripts; (b)
> making it possible to install 10-ifcfg-rh-routes.sh by default, provided
> that it's marked as non-blocking; and (c) more flexibility even for other
> events as the serialization can be disabled for scripts that don't require
> it.
> 
> So this seems a good solution to me.
> 
> > The problem is only how a script can signalize this...
> > 
> > We could interpret a file
> >   /etc/NetworkManager/dispatcher.d/script.no-wait~
> > that comes together with
> >   /etc/NetworkManager/dispatcher.d/script
> > 
> > (the '~' is already now considered special, in that we would ignore such a
> > file)
> > 
> > or how about the symlink suggestion?
> 
> I prefer the symlink solution to the one requiring an additional file with
> .no-wait~ suffix.


IMO that sounds like a good plan

Comment 14 Beniamino Galvani 2015-06-15 15:43:07 UTC

Created attachment 305313 [details] [review]
Subject: [PATCH v2] nm-dispatcher: allow scripts to be marked as non-blocking

Comment 15 Thomas Haller 2015-06-16 07:50:25 UTC


+    if (realpath (path, real) && g_str_has_prefix (real, NMD_SCRIPT_DIR_NON_BLOCKING))
+         return FALSE;


This doesn't work in this case:

  /etc/NetworkManager/dispatcher.d/script -> ./non-blocking.d/script
  /etc/NetworkManager/dispatcher.d/non-blocking.d/script -> /opt/...


Maybe:
   
   readlink(path, dst);
   basedir = dirname (dst);
   if (is_relative(dst))
       dst = build_path(base_dir, dst);
   realpath(basedir);
   if (strcmp (basedir, NMD_SCRIPT_DIR_DEFAULT) == 0)
       return TRUE;





could you create the non-blocking.d via make-install? Also, that directory should be owned by NetworkManager.rpm (contrib/fedora).

Comment 16 Beniamino Galvani 2015-06-16 20:35:16 UTC

Created attachment 305431 [details] [review]
[PATCH v3] nm-dispatcher: allow scripts to be marked as non-blocking

Comment 17 Beniamino Galvani 2015-06-16 20:36:27 UTC

(In reply to Thomas Haller from comment #15)
>
> This doesn't work in this case:
> 
>   /etc/NetworkManager/dispatcher.d/script -> ./non-blocking.d/script
>   /etc/NetworkManager/dispatcher.d/non-blocking.d/script -> /opt/...

Should be fixed now.

> could you create the non-blocking.d via make-install? Also, that directory
> should be owned by NetworkManager.rpm (contrib/fedora).

Done, thanks.

Comment 18 Dan Williams 2015-06-18 21:47:25 UTC

Review of attachment 305431 [details] [review]:

::: callouts/nm-dispatcher.c
@@ +133,3 @@
 
+	guint *script_watch_id;
+	guint *script_timeout_id;

I think we could just move these into the ScriptInfo struct and leave them guint (eg not pointer).  And then have script_free() run nm_clear_g_source() on them.  That would be clearer and less complicated I think than keeping indexes around and such.  then we don't need script->idx either.  It looks like everywhere the touches them already has the ScriptInfo that those values would be contained in.

Comment 19 Beniamino Galvani 2015-06-22 20:14:46 UTC

Created attachment 305857 [details] [review]
[PATCH v4] nm-dispatcher: allow scripts to be marked as non-blocking

Comment 20 Beniamino Galvani 2015-06-22 20:16:37 UTC

(In reply to Dan Williams from comment #18)
> Review of attachment 305431 [details] [review] [review]:
> 
> ::: callouts/nm-dispatcher.c
> @@ +133,3 @@
>  
> +	guint *script_watch_id;
> +	guint *script_timeout_id;
> 
> I think we could just move these into the ScriptInfo struct and leave them
> guint (eg not pointer).

Right, fixed in v4.

Comment 21 Thomas Haller 2015-06-26 07:21:10 UTC

you should mark the script as non-blocking by
+      using a symlink to a file in the /etc/NetworkManager/dispatcher.d/non-blocking.d
+      directory, or spawn a child process and have the parent return immediately.

this sounds like you could do one or the other and achieve the same result. But non-blocking.d scrips will run immediately, while other scripts have to wait in line.



Lets omit the buffer argument to realpath() to let the function allocate a buffer of appropriate size.




in dispatch_one_script(), doesn't this mean that a non-blocking script still has to wait for it's turn? Shouldn't you invoke the non-blocking scripts just right away, always?
Also, for example in handle_action() the request is queued 
  "g_queue_push_tail (h->pending_requests, request);"
so even non-blocking scripts have to wait for their turn.
Ok, with my comment above to the man page, it seems the man page ~does~ describe this behavior :) ... But I think that non-blocking.d scripts should run right away and not wait until they are in line.

Maybe non-serialized.d/ would be a better name... or no-wait.d/ ?

Comment 22 Thomas Haller 2015-06-26 07:22:16 UTC

>> and not wait until they are in line.

I mean: "and not wait in line to be started".

Comment 23 Beniamino Galvani 2015-06-29 10:10:14 UTC

(In reply to Thomas Haller from comment #21)
> this sounds like you could do one or the other and achieve the same result.
> But non-blocking.d scrips will run immediately, while other scripts have to
> wait in line.

See below.

> Lets omit the buffer argument to realpath() to let the function allocate a
> buffer of appropriate size.

Yes, as suggested in the BUGS section of realpath(3), this is the
preferred way.

> in dispatch_one_script(), doesn't this mean that a non-blocking script still
> has to wait for it's turn? Shouldn't you invoke the non-blocking scripts
> just right away, always?
> Also, for example in handle_action() the request is queued
>   "g_queue_push_tail (h->pending_requests, request);"
> so even non-blocking scripts have to wait for their turn.

Yes, this was the idea, the only difference of non-blocking.d/ scripts
is that they return immediately, and this is also consistent with the
name of the directory :) The order in which they are scheduled doesn't
change and so a non-blocking script has to wait for the termination of
previous blocking ones.

> Ok, with my comment above to the man page, it seems the man page ~does~
> describe this behavior :) ... But I think that non-blocking.d scripts should
> run right away and not wait until they are in line.

Executing non-blocking scripts before all other scripts has the
advantage of making things faster by default -- non-blocking scripts are
supposed to take long to execute and so they will run in parallel with
other scripts of the same request.

But the same result can be achieved also with the current
implementation: users have control on the order of scripts and they
can just put non-blocking ones earlier in the list.

And with the current implementation it's also possible to have a
non-blocking script depend on the output of a blocking script of the
same request, since they are run in order.

But I don't have a strong preference about this, I'm fine with both
approaches.

Comment 24 Thomas Haller 2015-06-29 10:40:23 UTC

Hm, seems it is unclear what issue this BZ is trying to solve.


If non-blocking.d/ only means that the dispatcher does not wait for the termination of the script, then the user can already achieve the very same effect by replacing the script with

  #!/bin/sh
  /path/to/real/script & 



as it is now, only one script runs at a time and scripts never pass other scripts that are scheduled before.

The issue of this BZ (as I understand it, but I don't really agree with that being an issue) is that especially for pre-up and pre-down scripts, the device state does not proceed until the scripts are run. So, ~other~ script can delay the execution of a pre-* script, and thus block the device activation.

How about renaming non-blocking.d to "no-wait.d" dir and scripts that link there are scheduled immediately (and in parallel).

Comment 25 Beniamino Galvani 2015-07-16 07:41:48 UTC

(In reply to Thomas Haller from comment #24)
> How about renaming non-blocking.d to "no-wait.d" dir and scripts that link
> there are scheduled immediately (and in parallel).

Sounds good. Pushed branch bg/dispatch-sync-requests-rh746703.

Comment 26 Thomas Haller 2015-07-16 12:14:46 UTC

(In reply to Beniamino Galvani from comment #25)
> (In reply to Thomas Haller from comment #24)
> > How about renaming non-blocking.d to "no-wait.d" dir and scripts that link
> > there are scheduled immediately (and in parallel).
> 
> Sounds good. Pushed branch bg/dispatch-sync-requests-rh746703.

It looked correct, but it was confusing to me.
While trying to understand it, I reworked it. Pushed a fixup. How is that?

Comment 27 Beniamino Galvani 2015-07-16 13:15:13 UTC

(In reply to Thomas Haller from comment #26)
> (In reply to Beniamino Galvani from comment #25)
> > (In reply to Thomas Haller from comment #24)
> > > How about renaming non-blocking.d to "no-wait.d" dir and scripts that link
> > > there are scheduled immediately (and in parallel).
> > 
> > Sounds good. Pushed branch bg/dispatch-sync-requests-rh746703.
> 
> It looked correct, but it was confusing to me.
> While trying to understand it, I reworked it. Pushed a fixup. How is that?

Now script_{watch,timeout}_cb() call unconditionally next_script() which would dispatch a new script also when a "nowait" script terminates. I think this can result in more "wait" scripts to run simultaneously, isn't it?

And also, now we wait the termination of all scripts for a request before continuing with the next request. This isn't wrong, it depends if it's the behavior we want. Before the fixup, the next request was started as soon as the last "wait" script had terminated, while there could still be some "nowait" script running and the D-Bus response was delayed until all scripts ("wait" and "nowait") had finished.

Comment 28 Thomas Haller 2015-07-16 13:58:58 UTC

(In reply to Beniamino Galvani from comment #27)
> (In reply to Thomas Haller from comment #26)
> > (In reply to Beniamino Galvani from comment #25)
> > > (In reply to Thomas Haller from comment #24)
> > > > How about renaming non-blocking.d to "no-wait.d" dir and scripts that link
> > > > there are scheduled immediately (and in parallel).
> > > 
> > > Sounds good. Pushed branch bg/dispatch-sync-requests-rh746703.
> > 
> > It looked correct, but it was confusing to me.
> > While trying to understand it, I reworked it. Pushed a fixup. How is that?
> 
> Now script_{watch,timeout}_cb() call unconditionally next_script() which
> would dispatch a new script also when a "nowait" script terminates. I think
> this can result in more "wait" scripts to run simultaneously, isn't it?

Oh correct. Fixed.


> And also, now we wait the termination of all scripts for a request before
> continuing with the next request. This isn't wrong, it depends if it's the
> behavior we want. Before the fixup, the next request was started as soon as
> the last "wait" script had terminated, while there could still be some
> "nowait" script running and the D-Bus response was delayed until all scripts
> ("wait" and "nowait") had finished.

Correct too. Fixed too (I think?)


How is this?

Comment 29 Beniamino Galvani 2015-07-18 12:57:00 UTC

(In reply to Thomas Haller from comment #28)
> Correct too. Fixed too (I think?)
> 
> How is this?

Looks ok; I pushed an additional fixup that solves some problems already present in my initial implementation and which were found during testing.

Comment 30 Thomas Haller 2015-07-19 10:43:09 UTC

(In reply to Beniamino Galvani from comment #29)
> (In reply to Thomas Haller from comment #28)
> > Correct too. Fixed too (I think?)
> > 
> > How is this?
> 
> Looks ok; I pushed an additional fixup that solves some problems already
> present in my initial implementation and which were found during testing.

At a quick glance, your fixup look good to me.

Another issue that my fixup had, was that it would schedule the quit-timeout, even if there were some no-wait scripts running. I didn't yet carefully look at your fix, but is that fixed there?

Comment 31 Beniamino Galvani 2015-07-20 12:50:25 UTC

(In reply to Thomas Haller from comment #30)
> Another issue that my fixup had, was that it would schedule the
> quit-timeout, even if there were some no-wait scripts running. I didn't yet
> carefully look at your fix, but is that fixed there?

Yes, the quit timeout is scheduled only when the number of pending requests is zero and the value is decreased after all scripts for a request have finished.

Comment 32 Thomas Haller 2015-07-22 21:16:54 UTC

I pushed a large number of fixups.

I think there were some issues, and the fixup aim to either do a refactoring (without functional change) or fixing an issue. I did them in many small steps, so that it might be easier to understand what happens and why.

(or you just remove the result...)


How does this look?

Comment 33 Thomas Haller 2015-07-22 21:21:56 UTC

-(or you just remove the result...)
-(or you just review the result as a whole...) 


(hfff, expressing myself at late hours... :) ).

Comment 34 Thomas Haller 2015-07-23 05:27:32 UTC

hmm... complete_request() still seems wrong. If it does next_request(), but that request fails to schedule_one_script(), we must ensure to do next_request() again.

Pushed another fixup.



This is so complicated :(


I think the way to review it is really, one-by-one, ensuring that the NO-CHANGE commit doesn't introduce any functional change, and that the FIX commit are correct...

Comment 35 Beniamino Galvani 2015-07-24 11:24:32 UTC

(In reply to Thomas Haller from comment #34)
> I think the way to review it is really, one-by-one, ensuring that the
> NO-CHANGE commit doesn't introduce any functional change, and that the FIX
> commit are correct...

It seems to me that

47babbb52a0c55f68ee45a1e879fdfe4fb4aa97d "NO-CHANGE move the call to
next_script() into complete_script()"

changes the behavior since after the commit next_request() is no
longer called after the termination of a "wait" script when there are
still "nowait" scripts running. Pushed a couple of other fixups.

Overall looks good, I repeated some tests and it seems that everything
is working as expected.

Comment 36 Beniamino Galvani 2015-08-04 15:14:03 UTC

Thomas, the additional commits you pushed look good.

I squashed all the existing fixup commits and added a couple more.

Comment 37 Thomas Haller 2015-08-04 16:06:33 UTC

(In reply to Beniamino Galvani from comment #36)
> Thomas, the additional commits you pushed look good.
> 
> I squashed all the existing fixup commits and added a couple more.

LGTM too. I also squashed your fixups now...



This is IMO ready. I only wonder about one thing...

now we mark no-wait scripts by sym-linking to no-wait.d.
It is well understood, that dispatcher runs the scripts sorted by filename, but that is not true for the no-wait scripts.

So, if you have:

   01-first.sh
   10-no-wait.sh -> ./no-wait.d/10-no-wait.sh
   20-no-wait.sh -> ./no-wait.d/20-no-wait.sh
   90-last.sh
   99-no-wait.sh -> ./no-wait.d/99-no-wait.sh


the actual order is

  10-no-wait.sh &
  20-no-wait.sh &
  99-no-wait.sh &
  wait $PID_10 $PID_20 $PID_99
  01-first.sh
  90-last.sh


IOW, the filename and order is meaningless for no-wait scripts.

Maybe, we should instead indicate no-wait scripts by having a special filename. Like:
  something.no-wait



Pushed a fixup commit with that suggestion.

Comment 38 Beniamino Galvani 2015-08-05 14:46:00 UTC

(In reply to Thomas Haller from comment #37) 
> IOW, the filename and order is meaningless for no-wait scripts.
> 
> Maybe, we should instead indicate no-wait scripts by having a special
> filename. Like:
>   something.no-wait
>
> Pushed a fixup commit with that suggestion.

Both approaches (symlink and filename suffix) seem equally good to me. Any other thoughts or preferences?

Comment 39 Dan Williams 2015-08-20 17:32:33 UTC

I'm slightly on the side of a directory, because that's something you only need to do once (and distros can create it at package install time).  That means that if you cd into the directory, it's pretty clear where no-wait scripts would go, and you don't need to find a manpage to tell you the exact suffix.

But besides that, both approaches are fine with me and the patches look OK.

Comment 40 Beniamino Galvani 2015-10-09 15:37:40 UTC

A directory was used to indicate no-wait scripts. Merged to master:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=127a4c5d9e0786b665d5f3be71fcd988d7fcf88c

Comment 41 Thomas Haller 2015-10-09 19:40:07 UTC

Created attachment 312974 [details] [review]
contrib/rpm: mark "10-ifcfg-rh-routes.sh" script as no-wait

Since commit 127a4c5d9e0786b665d5f3be71fcd988d7fcf88c we support
marking scripts as no-wait. This is especially interesting for pre-up
and pre-down scripts because those scripts block activation (of other
devices).

Mark the script as no-wait.

Comment 42 Thomas Haller 2016-01-20 14:40:47 UTC

(In reply to Thomas Haller from comment #41)
> Created attachment 312974 [details] [review] [review]
> contrib/rpm: mark "10-ifcfg-rh-routes.sh" script as no-wait
> 
> Since commit 127a4c5d9e0786b665d5f3be71fcd988d7fcf88c we support
> marking scripts as no-wait. This is especially interesting for pre-up
> and pre-down scripts because those scripts block activation (of other
> devices).
> 
> Mark the script as no-wait.

a similar patch was merged to master:
http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=bc7ad75d991c2d9b367d5f432c43a456a7faa8db

Comment 43 Thomas Haller 2016-01-20 14:41:14 UTC

feature is already merged. Closing BZ