After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 750595 - [review] lr/master-activations: don't enqueue activation of master connection if it has pending actions
[review] lr/master-activations: don't enqueue activation of master connection...
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: nm-review
 
 
Reported: 2015-06-08 20:45 UTC by Lubomir Rintel
Modified: 2016-09-07 18:29 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Lubomir Rintel 2015-06-08 20:45:55 UTC
manager: don't enqueue activation of master connection if it has pending actions
    
Maybe it's being activated already and one of the activations would supersede    the another causing the slave connection to loose track and get stuck forever    in prepared state.
    
This happens when a master apperars that has autoconnect=true slaves.

+ one small fix:
manager: trivial: log the address, not the path of the master active connection
Comment 1 Thomas Haller 2015-06-08 21:27:37 UTC
>> manager: trivial: log the address, not the path of the master active 

Logging the pointer value is also not very helpful. It would be useful for NMDevice instances, where we do that regularly and can correlate the pointer values...

How about adding the connection UUID of the ActiveConnection?


>> manager: don't enqueue activation of master connection if it has pending 

If it works for you, I doesn't look wrong to me :)


ACK
Comment 2 Dan Williams 2015-06-11 15:27:41 UTC
I think this might break some guarantees that we have, where subsequent requested activations are always started.  For example, if NM starts a master at startup, but then the old network scripts (or somebody's manual 'nmcli con up') happens; wouldn't this change then ignore those activations?

Also, I think just checking 'has pending action' isn't fine grained enough, because we have things like 'carrier wait' and device states (see nm_device_queue_state) and autoconnect checking.  I think we need to check if the device has a pending activation queued instead?

I *think* the reason that slaves get stuck is that:

1) when a queued activation is superceded, nm-device.c::_clear_queued_act_request() gets called
2) that sets the master AC's state to DEACTIVATED
3) the slave's AC gets nm-active-connection.c::master_state_cb() called in response to the master state chnage
4) but master_state_cb() only acts on DEACTIVATING and so the slave gets stuck

Possibly instead, we should do what nm-manager.c::_internal_activation_failed() does:

nm_active_connection_set_state (active, NM_ACTIVE_CONNECTION_STATE_DEACTIVATING);
nm_active_connection_set_state (active, NM_ACTIVE_CONNECTION_STATE_DEACTIVATED);

to make sure that the master AC goes through all the states?
Comment 3 Lubomir Rintel 2015-06-23 19:22:19 UTC
(In reply to Thomas Haller from comment #1)
> >> manager: trivial: log the address, not the path of the master active 
> 
> Logging the pointer value is also not very helpful. It would be useful for
> NMDevice instances, where we do that regularly and can correlate the pointer
> values...
> 
> How about adding the connection UUID of the ActiveConnection?

We log pointers elsewhere.

(In reply to Dan Williams from comment #2)
> I think this might break some guarantees that we have, where subsequent
> requested activations are always started.  For example, if NM starts a
> master at startup, but then the old network scripts (or somebody's manual
> 'nmcli con up') happens; wouldn't this change then ignore those activations?
> 
> Also, I think just checking 'has pending action' isn't fine grained enough,
> because we have things like 'carrier wait' and device states (see
> nm_device_queue_state) and autoconnect checking.  I think we need to check
> if the device has a pending activation queued instead?
> 
> I *think* the reason that slaves get stuck is that:
> 
> 1) when a queued activation is superceded,
> nm-device.c::_clear_queued_act_request() gets called
> 2) that sets the master AC's state to DEACTIVATED
> 3) the slave's AC gets nm-active-connection.c::master_state_cb() called in
> response to the master state chnage
> 4) but master_state_cb() only acts on DEACTIVATING and so the slave gets
> stuck
> 
> Possibly instead, we should do what
> nm-manager.c::_internal_activation_failed() does:
> 
> nm_active_connection_set_state (active,
> NM_ACTIVE_CONNECTION_STATE_DEACTIVATING);
> nm_active_connection_set_state (active,
> NM_ACTIVE_CONNECTION_STATE_DEACTIVATED);
> 
> to make sure that the master AC goes through all the states?

Yes. Fixing this leaves the slaves disconnected though.

Reworked a bit:

65b979c manager: reuse an active connection, if the same activation is pending
a169f5e active-connection: fail the activation if the master AC disconnect
2a16f34 ifcfg-rh,vlan: fall back to VLAN_ID if vlan id can't be determined from DEVICE
8a0ce5b manager: trivial: log the address, not the path of the master active connection
Comment 4 Lubomir Rintel 2015-06-24 16:51:15 UTC
44c36f1 merge: branch 'lr/master-activations'
7a066a3 manager: reuse an active connection, if the same activation is pending
4078834 active-connection: fail the activation if the master AC disconnect
f23a46d ifcfg-rh,vlan: fall back to VLAN_ID if vlan id can't be determined from DEVICE
edbf766 manager: log the address, not the path of the master active connection

Merged in, since it blocked testing.

Nevertheless, keeping it open if we need to revisit this:

10:41 <thaller> hey
10:41 <thaller> can you remove the work "trivial" from manager: trivial: log the address, not the path of the master active connection?
10:43 <thaller> typo in commit message: "connections with race"
10:43 <thaller> which
10:45 <thaller> maybe rename "pending_connection" to something better?
10:46 <thaller> authorizing_connections?
10:46 <thaller> this also needs a code comment :)
10:46 <thaller> other then that, it doesn't look wrong. But I don't know
16:06 <lrintel> re-pushed lr/master-activations with suggested corrections
16:06 <lrintel> do you think it's good to merge?
16:17 <thaller> the first two certainly look right
16:17 <thaller> I don't know about the others. They fix the bug?
16:19 <lrintel> yes
16:20 <thaller> I see nothing wrong, but let's ask dcbw too.
16:20 <thaller> for now, let's merge it (we can improve/revert later)
16:20 <thaller> but let's get a dcbw opinion on this :)
Comment 5 Dan Williams 2015-07-02 20:30:04 UTC
Yeah, the last two patches look OK to me.  The activating_connections thing is kinda icky though; we should clean that up in the future by perhaps creating a new internal property for authorizing, and then adding the AC to the priv->activating_connections, and filtering those 'authorizing' ones out of the D-Bus property.  Might need some other checks in places too.
Comment 6 Thomas Haller 2016-09-07 18:29:57 UTC
closing. If the above issues are actual issues, please reopen.