GNOME Bugzilla – Bug 728432
[review] dcbw/wifi-plugin: convert WiFi to a plugin
Last modified: 2014-05-13 20:33:23 UTC
See the branch. Less work than I expected actually...
After reviews I'll merge this branch *after* danw's WOL branch, since that moves the wifi utils into the platform and thus conflicts with this branch a bit.
Rebased on top of git master after danw/wol got merged.
Added a bunch of cleanup patches that make the code simpler and fix bugs/crashes in some of the platform's OLPC Mesh handling. The first three patches: wifi: remove old ipw rfkill polling functionality wifi: make Wi-Fi support a plugin wifi: use nm_device_get_connection() to simplify some code haven't changed recently and could be merged independently of the cleanups if anyone is interested in merging lowest-risk patches first, higher-risk patches later.
> wifi: make Wi-Fi support a plugin >+ -I${top_srcdir}/libnm-util \ >+ -DNM_VERSION_MAX_ALLOWED=NM_VERSION_NEXT_STABLE \ >+ $(DBUS_CFLAGS) \ add -DG_LOG_DOMAIN=\""NetworkManager-wifi"\" before the other -D line, in both wifi/Makefile.am and wifi/tests/Makefile.am. (For consistency with the just-merged danw/errors2 changes.) >+ nm_olpc_mesh_error_get_type; >+ nm_wifi_error_get_type; why do these need to be exported? >+hidden_ap_found (NMDevice *device, This got moved from nm-manager to nm-wifi-factory, but there's no reason NMDeviceWifi couldn't just call nm_connection_provider_get() itself and do all of this internally now. >+ PROP_0 = 0x1000, Hm? >+#include "devices/wifi/nm-wifi-ap.h" That got added to nm-manager.c, but I think it shouldn't be necessary now since you moved the hidden_ap_found() stuff out? > platform: don't crash on link_change() error when ifname is NULL Couldn't you just copy ifindex and ifname from rtnllink to change in link_change() before calling to_string_link()? > platform: better detection of OLPC Mesh interfaces Why is this needed? / How do we know it won't have false positives? > olpc-mesh: mesh device is only available when Wi-Fi companion is found Same comment as I made in some other bug about not liking nm_log_dbg() inside an is_available() function. I assume the olpc-mesh changes are tested?
(In reply to comment #4) > > wifi: make Wi-Fi support a plugin > > >+ -I${top_srcdir}/libnm-util \ > >+ -DNM_VERSION_MAX_ALLOWED=NM_VERSION_NEXT_STABLE \ > >+ $(DBUS_CFLAGS) \ > > add -DG_LOG_DOMAIN=\""NetworkManager-wifi"\" before the other -D line, in both > wifi/Makefile.am and wifi/tests/Makefile.am. (For consistency with the > just-merged danw/errors2 changes.) Fixed. > >+ nm_olpc_mesh_error_get_type; > >+ nm_wifi_error_get_type; > > why do these need to be exported? They don't, fixed. > >+hidden_ap_found (NMDevice *device, > > This got moved from nm-manager to nm-wifi-factory, but there's no reason > NMDeviceWifi couldn't just call nm_connection_provider_get() itself and do all > of this internally now. Done. > >+ PROP_0 = 0x1000, > > Hm? Copied from WiMAX I think, but now fixed in WiFi. > >+#include "devices/wifi/nm-wifi-ap.h" > > That got added to nm-manager.c, but I think it shouldn't be necessary now since > you moved the hidden_ap_found() stuff out? Removed. > > platform: don't crash on link_change() error when ifname is NULL > > Couldn't you just copy ifindex and ifname from rtnllink to change in > link_change() before calling to_string_link()? Could do that, but I was operating under the assumption that the called function should continue not to modify the argument passed in. Can change that if you'd like though. > > platform: better detection of OLPC Mesh interfaces > > Why is this needed? / How do we know it won't have false positives? I added it because my OLPC mesh device was getting detected as wifi, because I apparently don't have the udev rules anywhere, I think they get shipped with OLPC stuff. But I believe this is safe, since only the libertas driver's mesh interface exports the "anycast_mask" since it's specific to the Libertas OLPC mesh protocol implementation. > > olpc-mesh: mesh device is only available when Wi-Fi companion is found > > Same comment as I made in some other bug about not liking nm_log_dbg() inside > an is_available() function. I looked for that comment but I don't recall it; dcbw/wwan-fixes has a comment about is_available but it's about indentation. What's the issue you had in mind? > I assume the olpc-mesh changes are tested? Yeah, I busted out one of my usb8388 dongles which is what spawned the platform fixes. I tested insertion/deletion (and thus the dispose() patch) and also verified that companion WiFi interface detection works correctly no matter which order the interfaces are detected in (eg OLPC then WiFi, and WiFi then OLPC).
(In reply to comment #4) > > olpc-mesh: mesh device is only available when Wi-Fi companion is found > > Same comment as I made in some other bug about not liking nm_log_dbg() inside > an is_available() function. I see this now, it was a comment for dcbw/wwan-fixes: --- I'm dubious about logging (even at debug level) from a "get" function... seems like there's a lot of possibility of log spammage. --- I've found it useful in the past to log why a device isn't available for activation, at least for debugging stuff. Also, nm_device_is_available() is called once from nm-manager.c() at late stages of software device activation, and once from nm-device.c when entering the UNAVAILABLE state. So it's really not called that often, and shouldn't be triggerable by a user AFAICT. That said, I'm OK to remove this if you'd like it gone.
everything looks good now. keep the debug log I guess.
I'd like to get one more review on this... Thomas/Jirka/anyone else?
(In reply to comment #4) > > platform: don't crash on link_change() error when ifname is NULL > > Couldn't you just copy ifindex and ifname from rtnllink to change in > link_change() before calling to_string_link()? I don't really like the new copious calls to rtnl_link_set_ifindex() either, so alternative to what I have in the branch currently: --- a/src/platform/nm-linux-platform.c +++ b/src/platform/nm-linux-platform.c @@ -1838,19 +1838,19 @@ link_change (NMPlatform *platform, int ifindex, struct rtnl_link *change) * potentially be improved. */ switch (nle) { case -NLE_SUCCESS: case -NLE_EXIST: break; case -NLE_OBJ_NOTFOUND: - error ("Firmware not found for changing link %s; Netlink error: %s)", to_string_link (platform, change), nl_geterror (nle)); + error ("Firmware not found for changing link %s; Netlink error: %s)", to_string_link (platform, rtnllink), nl_geterror (nle)); platform->error = NM_PLATFORM_ERROR_NO_FIRMWARE; return FALSE; default: - error ("Netlink error changing link %s: %s", to_string_link (platform, change), nl_geterror (nle)); + error ("Netlink error changing link %s: %s", to_string_link (platform, rtnllink), nl_geterror (nle)); return FALSE; } return refresh_object (platform, (struct nl_object *) rtnllink, FALSE, NM_PLATFORM_REASON_INTERNAL); } static gboolean ie, just log the actual link we got from the cache, instead of the likely-incomplete 'change' template which won't have ifindex or name. We'd still keep the ifname checks in link_extract_type() and ethtool_get_driver() though. Would you like that better?
(In reply to comment #9) > (In reply to comment #4) > > > platform: don't crash on link_change() error when ifname is NULL > > > > Couldn't you just copy ifindex and ifname from rtnllink to change in > > link_change() before calling to_string_link()? > > I don't really like the new copious calls to rtnl_link_set_ifindex() either, so > alternative to what I have in the branch currently: > > --- a/src/platform/nm-linux-platform.c > +++ b/src/platform/nm-linux-platform.c > @@ -1838,19 +1838,19 @@ link_change (NMPlatform *platform, int ifindex, struct > rtnl_link *change) > * potentially be improved. > */ > switch (nle) { > case -NLE_SUCCESS: > case -NLE_EXIST: > break; > case -NLE_OBJ_NOTFOUND: > - error ("Firmware not found for changing link %s; Netlink error: > %s)", to_string_link (platform, change), nl_geterror (nle)); > + error ("Firmware not found for changing link %s; Netlink error: > %s)", to_string_link (platform, rtnllink), nl_geterror (nle)); > platform->error = NM_PLATFORM_ERROR_NO_FIRMWARE; > return FALSE; > default: > - error ("Netlink error changing link %s: %s", to_string_link > (platform, change), nl_geterror (nle)); > + error ("Netlink error changing link %s: %s", to_string_link > (platform, rtnllink), nl_geterror (nle)); > return FALSE; > } > > return refresh_object (platform, (struct nl_object *) rtnllink, FALSE, > NM_PLATFORM_REASON_INTERNAL); > } > > static gboolean > > ie, just log the actual link we got from the cache, instead of the > likely-incomplete 'change' template which won't have ifindex or name. > > We'd still keep the ifname checks in link_extract_type() and > ethtool_get_driver() though. > > Would you like that better? Maybe it should actually log both of them? Would be interesting for debug logging to see the template (@rtnllink) and the @change. Anyway, I would be more interested in the @change object, because the @rtnllink object should be the same as at its last LINK_CHANGED signal (which we do log). Anyway, to_string_* must not cause any problems however incomplete the passed in rtnl_link object is. And I agree, that they should never modify the passed in objects. Of course, the whole to_string is suboptimal, because we first convert it to the NMPlatform* object and then use nm_platform_*_to_string functions. A more true approach would be to implement separate to_string functions for the libnl objects. But I think it is not worth the extra coding effort... (and I always plan of let platform cache the native NMPlatform* objects, instead of libnl objects).
Side note: maybe nm_device_factory_get_device_type() should be a property of NMDeviceFactory instead. Why did you choose to do it this way? Actually, there is already PROP_DEVICE_TYPE... Why do we need this function?
> wifi: make Wi-Fi support a plugin break; - case NM_LINK_TYPE_OLPC_MESH: - device = nm_device_olpc_mesh_new (plink); - break; - case NM_LINK_TYPE_WIFI: - device = nm_device_wifi_new (plink); - break; case NM_LINK_TYPE_BOND: if the plugin is not available, it does default: device = nm_device_generic_new (plink); break; I guess, this is correct? Only because in case of WiMAX, we choose to ignore the device...
(In reply to comment #11) > Side note: > > maybe nm_device_factory_get_device_type() should be a property of > NMDeviceFactory instead. Why did you choose to do it this way? > > Actually, there is already PROP_DEVICE_TYPE... > Why do we need this function? Also, the plugin supports NM_LINK_TYPE_WIFI/NM_DEVICE_TYPE_WIFI and NM_LINK_TYPE_OLPC_MESH/NM_DEVICE_TYPE_OLPC_MESH. Is it correct, that the nm_device_factory_get_device_type() and NMDeviceFactory() only can report NM_DEVICE_TYPE_WIFI?
The branch looks good to me. But I think the logs could mention that Wi-Fi is not available if a plugin is not loaded (installed).
> platform: don't crash on link_change() error when ifname is NULL + if (!ifname) { + g_warn_if_fail (ifname != NULL); + return_type (NM_LINK_TYPE_UNKNOWN, type); + } I would remove the g_warn_if_fail. I think this can happen under normal conditions, so a warning seems inappropriate. > olpc-mesh: reorganize functions Could you mention "trivial" in commit message subject? Pushed several fixup commits on top of yours, and two things that bothered me along the way...
(In reply to comment #10) > Maybe it should actually log both of them? Would be interesting for debug > logging to see the template (@rtnllink) and the @change. Perhaps the link_change() caller could log on failure, since it knows what details of @change are relevant anyway? > Anyway, to_string_* must not cause any problems however incomplete the passed > in rtnl_link object is. also true (In reply to comment #12) > if the plugin is not available, it does > default: > device = nm_device_generic_new (plink); > break; > I guess, this is correct? Only because in case of WiMAX, we choose to ignore > the device... That's from http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=6b8bf26b. I think the nm-manager.c change in that bug was unnecessary/incorrect; before that commit, WiMAX devices were being identified as ethernet, but the platform part of that patch should have fixed that, and made them get turned into NMDeviceGeneric instead. So, I think we want to remove the NM_LINK_TYPE_WIMAX check in nm-manager.c
(In reply to comment #11) > Side note: > > maybe nm_device_factory_get_device_type() should be a property of > NMDeviceFactory instead. Why did you choose to do it this way? > > Actually, there is already PROP_DEVICE_TYPE... > Why do we need this function? The function exists to ensure we don't double-load plugins, since nm-manager calls the function before we create the factory object. This saves memory and time because we dont' bother creating the factory object if we already have a plugin for that type. The property exists on the factory object so that we can compare the plugin we are deciding to load against the factories we have already loaded, because we don't save the function pointer to each factory's nm_device_factory_get_device_type() method. We could use g_object_set_data() to save the nm_device_factory_get_device_type() pointer on the factory object, which would remove a bunch of code in the plugins. Thoughts on that?
(In reply to comment #13) > (In reply to comment #11) > > Side note: > > > > maybe nm_device_factory_get_device_type() should be a property of > > NMDeviceFactory instead. Why did you choose to do it this way? > > > > Actually, there is already PROP_DEVICE_TYPE... > > Why do we need this function? > > Also, the plugin supports NM_LINK_TYPE_WIFI/NM_DEVICE_TYPE_WIFI and > NM_LINK_TYPE_OLPC_MESH/NM_DEVICE_TYPE_OLPC_MESH. > > Is it correct, that the nm_device_factory_get_device_type() and > NMDeviceFactory() > only can report NM_DEVICE_TYPE_WIFI? I guess it's not 100% technically correct, but the OLPC Mesh stuff is provided by the same plugin as the WiFi stuff, so it's not an issue in practice. In reality, the "plugin device type" stuff is only there to prevent the same plugin from being loaded twice. I don't think adding the ability to return an array of types is really necessary at this point; if we have future plugins that require this we could add it though.
(In reply to comment #17) > (In reply to comment #11) > > Side note: > > > > maybe nm_device_factory_get_device_type() should be a property of > > NMDeviceFactory instead. Why did you choose to do it this way? > > > > Actually, there is already PROP_DEVICE_TYPE... > > Why do we need this function? > > The function exists to ensure we don't double-load plugins, since nm-manager > calls the function before we create the factory object. This saves memory and > time because we dont' bother creating the factory object if we already have a > plugin for that type. > > The property exists on the factory object so that we can compare the plugin we > are deciding to load against the factories we have already loaded, because we > don't save the function pointer to each factory's > nm_device_factory_get_device_type() method. > > We could use g_object_set_data() to save the > nm_device_factory_get_device_type() pointer on the factory object, which would > remove a bunch of code in the plugins. Thoughts on that? I implemented this as "devices: simplify plugin type checking", and it's a very nice code reduction (-140 LoC net). I also added symbol visibility restrictions to the other plugins too. Re-pushed and squashed with Thomas' fixups and the plugin simplify/symbols patches. Hopefully final review...
In general, I would prefer only exposing one single function from the plugin, which then either returns a vtable (with additional functions) or preferably a glib object (which implements virtual functions). In principle I found the property on part of the factory object a better approach then having a second public function to return the type. OTOH, this approach to detect duplicate plugins is not sophisticated anyway (e.g. the wifi plugin handles two device types, but the mechanism can only check for one type). So, I would say, it is good enough at the moment. When packaging plugins, we only have to make sure that a plugin package tightly depends on the NM version, because the ABI is subject to change. ACK to the whole branch.
I rebased dcbw/wifi-plugin after merging dcbw/wwan-fixes. This lets us remove and simplify some of the rfkill stuff in nm-manager.c. So, if people wouldn't mind re-checking the nm-manager.c changes "wifi: remove old ipw rfkill polling functionality" that would be great. Specifically, we can remove "other_enabled_func" and thus simplify the rfkill state checking. Previously, manager_rfkill_update_one_type() checked "other_enabled_func" and the final rfkill state for a radio type was the "worst" state of [ other_enabled_func state, kernel rfkill state ]. Now that we can remove other_enabled_func, we can simplify that and just have update_rstate_from_rfkill() read the kernel rfkill state directly.
Looks good to me.
I agree with Jirka's suggestion above that we should log about missing plugins. (Ideally per-interface. Eg, "(wlan0): NetworkManager-wifi plugin is not available; treating this as a generic device"). I assume you aren't planning to land the whole branch with --no-ff? The wifi/mesh cleanups seem like they could be their own merges, and some of the patches (eg, the two from Thomas) aren't really related to making wifi a plugin at all, and should probably just be committed separately > platform: assert libnl alloc functions against out of memory I'd prefer if the wrapper functions had named that started with "nm_", to make it clear when they're used later that they're local functions, not libnl internals. > *: implement plugin symbol visibility >+check-local: >+ $(top_srcdir)/tools/check-exports.sh $(builddir)/.libs/libnm-device-plugin-wwan.so $(SYMBOL_VIS_FILE) I was actually about to propose removing these tests from libnm-util and libnm-glib; they run from the same data that we pass to libtool, so basically they're checking that libtool isn't broken? Which isn't our job. (Or else maybe they're checking that we didn't forget to pass the symbol file to libtool, but that's silly since if we forgot to do that then we probably forgot to add the check-local: test too...)
(In reply to comment #23) > I agree with Jirka's suggestion above that we should log about missing plugins. > (Ideally per-interface. Eg, "(wlan0): NetworkManager-wifi plugin is not > available; treating this as a generic device"). Done: "core: log when creating Generic device when a plugin is missing". Obviously we can only figure this out for device types that create netlink objects (wifi, mesh, wimax). > I assume you aren't planning to land the whole branch with --no-ff? The > wifi/mesh cleanups seem like they could be their own merges, and some of the > patches (eg, the two from Thomas) aren't really related to making wifi a plugin > at all, and should probably just be committed separately Wasn't planning on landing the whole thing like this, it's really ballooned. I'll split the branch into (1) just the wifi plugin stuff and cleanups and (2) a grab-bag of the rest of the stuff. Sound OK? > > platform: assert libnl alloc functions against out of memory > > I'd prefer if the wrapper functions had named that started with "nm_", to make > it clear when they're used later that they're local functions, not libnl > internals. Done. > > *: implement plugin symbol visibility > > >+check-local: > >+ $(top_srcdir)/tools/check-exports.sh $(builddir)/.libs/libnm-device-plugin-wwan.so $(SYMBOL_VIS_FILE) > > I was actually about to propose removing these tests from libnm-util and > libnm-glib; they run from the same data that we pass to libtool, so basically > they're checking that libtool isn't broken? Which isn't our job. (Or else maybe > they're checking that we didn't forget to pass the symbol file to libtool, but > that's silly since if we forgot to do that then we probably forgot to add the > check-local: test too...) You're right, though it does do a few things: 1) before you add the -Wl,--version-script if you run 'make check' you can see everything the library exports, so you can make a better decision about which exact symbols should be exported and which should not 2) it ensures that the symbol file is alphabetical, which isn't a big deal 3) it ensures that symbols removed from the library are also removed from the .ver file, and ensures testcases fail if we remove public API inadvertently without intentionally updating the .ver file We could retain #1 by just having a helper script to check the exports which we could run manually. I don't care that much about #2. #3 is nice though.
Rebased and pushed: dcbw/wifi-plugin - only the original wifi plugin and mesh/supplicant cleanups dcbw/plugin-cleanups - thaller's fixes, plugin symbol visibility stuff, and the plugin type func/property simplification I would merge these independently with --no-ff if that sounds OK to you.
(In reply to comment #24) > 1) before you add the -Wl,--version-script if you run 'make check' you can see > everything the library exports, so you can make a better decision about which > exact symbols should be exported and which should not > > 2) it ensures that the symbol file is alphabetical, which isn't a big deal > > 3) it ensures that symbols removed from the library are also removed from the > .ver file, and ensures testcases fail if we remove public API inadvertently > without intentionally updating the .ver file ok, I'm sold. updated branches look good
looks good to me too (just quick look this time). but: > 2) it ensures that the symbol file is alphabetical, which isn't a big deal I think this is nice... I wouldn't remove this check, seeing that it's already there.
Both branches (dcbw/wifi-plugin and dcbw/plugin-cleanups) merged to master.