GNOME Bugzilla – Bug 734347
NM should notify disconnection to MM only after pppd exits
Last modified: 2015-02-25 14:09:02 UTC
ModemManager needs to have CLOCAL set in the TTY termios configuration, in order to notify the kernel that modem control lines are not in effect (e.g. so that a transition to LOW in the DCD input control line doesn't trigger a hangup in the TTY). pppd in the other hand, needs CLOCAL unset in order to have proper modem control lines in effect during the PPP session. So, when pppd starts it will store the original termios settings, and before exiting it will restore the original settings in the TTY. In other words, if CLOCAL was set before launching pppd, CLOCAL will be also set after pppd exits. Now, in order for this sequence to work correctly, NetworkManager also needs to make sure that ModemManager is notified about the disconnection only after pppd has really finished re-configuring the TTY.
Created attachment 282652 [details] [review] [PATCH 1/4] device: new deactivate_async() method to be run during
Created attachment 282653 [details] [review] [PATCH 2/4] ppp-manager: new async stop() method to request stop and
Created attachment 282654 [details] [review] [PATCH 3/4] wwan,modem: let disconnect() be an async operation
Created attachment 282655 [details] [review] [PATCH 4/4] wwan: wait for pppd to exit before relaying the port to ModemManager
Review of attachment 282652 [details] [review]: ::: src/devices/nm-device.c @@ +4148,3 @@ goto out; } + g_assert (ret == NM_ACT_STAGE_RETURN_SUCCESS); Maybe just kill this hunk, I can't actually tell what it changes except for whitespace? @@ +6665,3 @@ dispatcher_cleanup (self); + if (priv->deactivating_cancellable) + g_cancellable_cancel (priv->deactivating_cancellable); This immediately calls deactivate_async_ready() and clears priv->deactivating_cancellable right? @@ +7371,3 @@ const char *hw_addr, *p; guint count; + Kill this hunk too, just whitespace changes. Yeah it's right, but it should be a separate cleanup patch or something.
Review of attachment 282653 [details] [review]: Looks good to me.
Review of attachment 282654 [details] [review]: ::: src/devices/wwan/nm-modem-broadband.c @@ +914,3 @@ + ctx->cancellable = cancellable ? g_object_ref (cancellable) : NULL; + if (disconnect_context_complete_if_cancelled (ctx)) + return; Won't this patch warn by itself? nm-modem.c passes NULL for the cancellable, which then gets disconnect_context_complete_if_cancelled() called on it which will try to use the cancellable...
Review of attachment 282654 [details] [review]: ::: src/devices/wwan/nm-modem-broadband.c @@ +914,3 @@ + ctx->cancellable = cancellable ? g_object_ref (cancellable) : NULL; + if (disconnect_context_complete_if_cancelled (ctx)) + return; Never mind, g_cancellable_set_error_if_cancelled() handles NULL cancellable.
Review of attachment 282655 [details] [review]: Overall looks good, just the log nitpick. ::: src/devices/wwan/nm-modem.c @@ +1027,3 @@ + case DEACTIVATE_CONTEXT_STEP_LAST: + nm_log_info (LOGD_MB, "(%s) modem deactivation finished", + nm_modem_get_uid (ctx->self)); Two requests here: 1) make this nm_log_dbg() 2) put a ':' after the modem uid (eg, "(ttyACM0): modem deactivation finished")
(In reply to comment #5) > Review of attachment 282652 [details] [review]: > > ::: src/devices/nm-device.c > @@ +4148,3 @@ > goto out; > } > + g_assert (ret == NM_ACT_STAGE_RETURN_SUCCESS); > > Maybe just kill this hunk, I can't actually tell what it changes except for > whitespace? > Hunk killed in the next updated patch. > @@ +6665,3 @@ > dispatcher_cleanup (self); > + if (priv->deactivating_cancellable) > + g_cancellable_cancel (priv->deactivating_cancellable); > > This immediately calls deactivate_async_ready() and clears > priv->deactivating_cancellable right? > That depends on how the cancellation is implemented in the deactivating() callbacks. The code should work both if cancelled directly or cancelled later in idle. > @@ +7371,3 @@ > const char *hw_addr, *p; > guint count; > + > > Kill this hunk too, just whitespace changes. Yeah it's right, but it should be > a separate cleanup patch or something. Hunk killed in the next updated patch.
(In reply to comment #9) > Review of attachment 282655 [details] [review]: > > Overall looks good, just the log nitpick. > > ::: src/devices/wwan/nm-modem.c > @@ +1027,3 @@ > + case DEACTIVATE_CONTEXT_STEP_LAST: > + nm_log_info (LOGD_MB, "(%s) modem deactivation finished", > + nm_modem_get_uid (ctx->self)); > > Two requests here: > > 1) make this nm_log_dbg() > 2) put a ':' after the modem uid (eg, "(ttyACM0): modem deactivation finished") Changes done in the next updated patch.
Created attachment 294538 [details] [review] Patch 1/5
Created attachment 294539 [details] [review] Patch 2/5
Created attachment 294540 [details] [review] Patch 3/5
Created attachment 294541 [details] [review] Patch 4/5
Created attachment 294542 [details] [review] Patch 5/5 This one is new w.r.t the original ones, just rewrites some log messages with a consistent format
Review of attachment 294538 [details] [review]: ::: src/devices/nm-device.c @@ +7436,3 @@ +deactivate_async_ready (NMDevice *self, + GAsyncResult *res, + gpointer user_data) whitespace... @@ +7445,3 @@ + + /* If operation cancelled, just return */ + if ( (error && g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED)) g_error_matches() already returns FALSE if error == NULL @@ +7472,3 @@ + + g_return_if_fail (call_id == priv->dispatcher.call_id); + Perhaps also g_return_if_fail() here that priv->dispatcher.post_state == NM_DEVICE_STATE_DISCONNECTED? This function doesn't use post_state at all, but does use the reason, so we should make sure that post_state is what we expect it to be (even if we don't use it).
Review of attachment 294539 [details] [review]: ::: src/ppp-manager/nm-ppp-manager.c @@ +818,2 @@ if (err != 0) ppp_exit_code (err, priv->pid); The function is using priv->pid, but the removal of the g_assert (pid == priv->pid); implies that this function might be called when the two don't match? If so, should we keep the 'pid' argument, but simply use 'pid' everywhere here? Also, I think we don't actually need to call process_exit() at all from the nm_ppp_manager_stop() path. process_exit() does three things: 1) prints some warnings - we expect pppd to have exited from the _stop() path so we don't care about the warnings 2) sets priv->pid = 0 - even though _ppp_cleanup() is modified below to no longer clear priv->pid, it doesn't seem worth calling this function from the _stop() paths only to get this functionality. Might as well just clear it manually from kill_child_ready() 3) emits the STATE_CHANGED signal - in the _stop() path I'd argue we don't actually care about the signal, because the caller *expects* pppd to die, so they should already be handling cleanup stuff So what I'd do here instead is revert the ppp_watch_cb() hunks of the patch, and simply not bother calling process_exit() from the _stop() code paths. Since _ppp_cleanup() will remove the child watch that triggers ppp_watch_cb() before killing the child, this code would never be called from the _stop() path. @@ +1182,3 @@ + + if (priv->pid) { + nm_utils_kill_child_async (priv->pid, SIGTERM, LOGD_PPP, "pppd", 2000, NULL, NULL); The function is named ppp_sync_kill(), but it actually kills the child asynchronously? Should this be nm_utils_kill_child_sync() instead?
Review of attachment 294540 [details] [review]: Looks good to me.
Review of attachment 294541 [details] [review]: Looks good to me.
Review of attachment 294542 [details] [review]: Looks good to me.
Created attachment 294708 [details] [review] Patch 1/5 Updated patch 1/5, addressing the things from comment 17.
Created attachment 294709 [details] [review] Patch 2/5 Updated patch 2 as per comment 18.
Review of attachment 294708 [details] [review]: LGTM
Review of attachment 294709 [details] [review]: LGTM
Will do a smoketest tomorrow and commit, thanks!
I added a priv->pid = 0; as the last line of nm_ppp_manager_stop(), because otherwise I see _ppp_kill() being called in this call chain, and that causes nm_utils_kill_child_async() to report an error about ECHLD because the child has already been killed by nm_ppp_manager_stop(): nm_ppp_manager_stop() nm_utils_kill_child_async() kill_child_ready() stop_context_complete() g_object_unref (ctx->manager); dispose() _ppp_kill() and really, after attempting to kill it, if the kill fails there's not much we can do anyway, so might as well forget the PID.
Merged to git master. Thanks!