Bug 696434 – [review] pavlix/platform

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 696434 - [review] pavlix/platform


Summary:	[review] pavlix/platform


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Pavel Simerda
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Duplicates:	683173 683174 (view as bug list)
Depends on:
Blocks:	nm-0.9.10 697360

Reported:	2013-03-23 04:05 UTC by Pavel Simerda
Modified:	2013-04-10 14:54 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Pavel Simerda 2013-03-23 04:05:44 UTC

Hi,

I'm submitting pavlix/platform for the first round of review. Currently you have to use ./autogen.sh --disable-valgrind. As valgrind support is very much new, this will be resolved later by either fixing the code or adding more valgrind exceptions.

For the beginning, any suggestions are welcome.

Pavel

Comment 1 Pavel Simerda 2013-03-23 04:06:39 UTC

(In reply to comment #0)
> ./autogen.sh --disable-valgrind.

Sorry, it's ./autogen.sh --without-valgrind.

Comment 2 Pavel Simerda 2013-03-25 20:28:36 UTC

Update:

All memory leaks are somehow related to libnl. Therefore you can use:

./autogen.sh (without arguments)

Or for root checks:

./autogen.sh --with-tests=root --without-valgrind

Or, if you want to look at valgrind error:

./autogen.sh --with-tests=root (with valgrind installed)

Comment 3 Pavel Simerda 2013-03-27 21:17:18 UTC

*** Bug 683173 has been marked as a duplicate of this bug. ***

Comment 4 Pavel Simerda 2013-03-27 21:17:39 UTC

*** Bug 683174 has been marked as a duplicate of this bug. ***

Comment 5 Dan Williams 2013-03-28 20:06:31 UTC

Code docs for things that return or signals that emit NMPlatformLink or NMPlatform*Route or NMPlatform*Address should probably indicate that the object is a *snapshot* of the information they contain, and not long-lived.

In NMLinuxPlatformPrivate, 'event_id' should be a guint, not an int.  I have no idea why gcc doesn't warn about that.

In the Linux platform's setup() function, how much error checking do we need to do?  Do we need to ensure that all the caches we've created are !NULL?  Obviously if we can't initialize the platform, we should fail NM startup though, so maybe we should have these checked and return FALSE so NM dies.  Or g_assert() I guess.

In nm_platform_setup(), no need to g_assert (klass).  If the object has been allocated, it's class will always be valid.

Next, when is it valid to call nm_platform_link_add() or nm_platform_dummy_add() with a NULL interface name?  If it's not valid, then we should g_return_val_if_fail() check that.

IN nm_platform_link_delete() there are two calls to nm_platform_link_get_name(), one can be removed and you can just check if (!name) return FALSE;

Random comment about the errors: nm_platform_link_get_ifindex() is a great example.  If the link isn't found, it sets platform->error.  But the specific platform method it calls *also* sets platform->error, so the specific platform  error gets overwritten.  It's not a problem here because the error is the same, but in the future if the platform gets more complicated, this could cause more detailed error information to be overwritten and unavailable for debugging.

Instead of memcmp() for IPv6 address comparison (like in the fake platform), how about IN6_ARE_ADDR_EQUAL() from <netinet/in.h> ?  That's clearer.

In Fake's ip4_route_delete() and other functions like ip4_route_exists(), the if() condition formatting is inconsistent:

		if (route->ifindex == ifindex && route->plen == plen
				&& route->metric == metric
				&& route->network == network
				&& route->gateway == gateway) {

either each condition should be on a single line, or they should be combined up to 80 chars wide.  Since we're using 4-space tabs, 'network' can go on the same line as 'metric', but to me, that's less visually clear than putting them all on their own lines.  Or, since this exact same code is used in a few places, how about a static inline helper like ip4_route_equal()?

For Linux, honestly I'd rather see nm_nl_cache_search() used instead of #defining it to nl_cache_search().  Then it's clearer when the workaround is used and when it's not supposed to be used.  Otherwise it's just confusing later in the code whether or not you care about the address family or not.

Also, for multi-line if() statements like in link_extract_type(), even for subsequent switch statements, I'd like to see {} for the if block.  That makes it a lot less error prone later on if more code gets added to the if block and then you wonder why your code always gets executed.  In this case, even worse if in the future another if() block was added just below the switch.  Always using {} helps prevent errors like that.

Why does the Linux platform's link_get_flags() function return IFF_NOARP when the link can't be found?  Shouldn't it return 0?  This would cause nm_platform_link_uses_arp (-1) to return TRUE, which doesn't seem right.

Linux platform's link_change_flags() method should g_return_val_if_fail (change != NULL, FALSE); not "1".

Is there a reason that the socket credentials code is no longer required, eg:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/src/nm-netlink-monitor.c?id=cadc56abc80b62b328ff345f22a3642e124c4740

Comment 6 Pavel Simerda 2013-03-29 23:29:01 UTC

(In reply to comment #5)
> Code docs for things that return or signals that emit NMPlatformLink or
> NMPlatform*Route or NMPlatform*Address should probably indicate that the object
> is a *snapshot* of the information they contain, and not long-lived.

Done for links, will revisit addresses and routes later.

> In NMLinuxPlatformPrivate, 'event_id' should be a guint, not an int.  I have no
> idea why gcc doesn't warn about that.

Fixed.

> In the Linux platform's setup() function, how much error checking do we need to
> do?  Do we need to ensure that all the caches we've created are !NULL?

Yes, that's a sensible requirement.

> Obviously if we can't initialize the platform, we should fail NM startup
> though, so maybe we should have these checked and return FALSE so NM dies.  Or
> g_assert() I guess.

I'm ok with g_assert.

> In nm_platform_setup(), no need to g_assert (klass).  If the object has been
> allocated, it's class will always be valid.

Fixed.

> Next, when is it valid to call nm_platform_link_add() or
> nm_platform_dummy_add() with a NULL interface name?  If it's not valid, then we
> should g_return_val_if_fail() check that.

It's there (nm_platform_link_add checks the name).

> IN nm_platform_link_delete() there are two calls to
> nm_platform_link_get_name(), one can be removed and you can just check if
> (!name) return FALSE;

Fixed.

> Random comment about the errors: nm_platform_link_get_ifindex() is a great
> example.  If the link isn't found, it sets platform->error.  But the specific
> platform method it calls *also* sets platform->error, so the specific platform 
> error gets overwritten.  It's not a problem here because the error is the same,
> but in the future if the platform gets more complicated, this could cause more
> detailed error information to be overwritten and unavailable for debugging.

Fixed by removing error handling from the specific one.

> Instead of memcmp() for IPv6 address comparison (like in the fake platform),
> how about IN6_ARE_ADDR_EQUAL() from <netinet/in.h> ?  That's clearer.

Definitely. Fixed.

> In Fake's ip4_route_delete() and other functions like ip4_route_exists(), the
> if() condition formatting is inconsistent:
> 
>         if (route->ifindex == ifindex && route->plen == plen
>                 && route->metric == metric
>                 && route->network == network
>                 && route->gateway == gateway) {
> 
> either each condition should be on a single line, or they should be combined up
> to 80 chars wide.  Since we're using 4-space tabs, 'network' can go on the same
> line as 'metric', but to me, that's less visually clear than putting them all
> on their own lines.

Better?

> Or, since this exact same code is used in a few places,
> how about a static inline helper like ip4_route_equal()?

Good idea, I expanded it a little bit and added ip4_route_get and ip6_route_get private functions.

> For Linux, honestly I'd rather see nm_nl_cache_search() used instead of
> #defining it to nl_cache_search().  Then it's clearer when the workaround is
> used and when it's not supposed to be used.

It's supposed to be used at *all* times. I was going to ask Thomas to fix this upstream in the future (it may not be possible now for compatibility reasons).

> Otherwise it's just confusing later in the code whether or not you care about the address family or not.

AFAIK links can't have address families associated with them. The concept is IMO wrong from the beginning.

It is trivial to change the code to use nm_nl_cache_search() but the whole point of this macro was to fix the behavior of nl_cache_search() to something sensible. For now I'm just adding more comments to workaround.

> Also, for multi-line if() statements like in link_extract_type(), even for
> subsequent switch statements, I'd like to see {} for the if block.

That was a mistake.

> That makes
> it a lot less error prone later on if more code gets added to the if block and
> then you wonder why your code always gets executed.  In this case, even worse
> if in the future another if() block was added just below the switch.  Always
> using {} helps prevent errors like that.

Fixed.

> Why does the Linux platform's link_get_flags() function return IFF_NOARP when
> the link can't be found?

Because the kernel flag is negative and we want nm_platform_link_uses_arp (nonsense) return FALSE. And it only returns FALSE if IFF_NOARP is present.

> Shouldn't it return 0?  This would cause
> nm_platform_link_uses_arp (-1) to return TRUE, which doesn't seem right.

It's the exact other way round. I modified tests/test-link.c and added a test_bogus() function so that you can see what exact results I expect from a nonexistant device.

I also added more tests for up/connected/arp and found another kernel bug (documented in the testing code).

> Linux platform's link_change_flags() method should g_return_val_if_fail (change
> != NULL, FALSE); not "1".

Fixed.

> Is there a reason that the socket credentials code is no longer required, eg:
> 
> http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/src/nm-netlink-monitor.c?id=cadc56abc80b62b328ff345f22a3642e124c4740

I don't expect to parse IFLA_PROTINFO using nm-platform but it looks like socket credentials are a better way to check stuff. Please look at the current code.

Thanks!

Pavel

Comment 7 Dan Winship 2013-04-02 17:11:40 UTC

"received" is misspelled as "recieved" in a bunch of places (both strings and symbol names)

>+/* libnl inclues LINK_ATTR_FAMILY in oo_id_attrs of link_obj_ops and thus
>+ * refuses to search for items that lack this attribute. I believe this is a
>+ * bug or a bad design at the least.

did you file a bug / discuss this with Thomas? (If there's a bug filed somewhere, put a link to it in the comment so that later on when someone comes across this code, they can follow the link to see if the bug has been fixed upstream and the workaround can be removed.)

Likewise for the various other libnl helper methods.

>+		switch (rtnl_link_get_arptype (rtnllink)) {
>+		case ARPHRD_LOOPBACK:
>+			return NM_LINK_TYPE_LOOPBACK;
>+		case ARPHRD_ETHER:
>+			return NM_LINK_TYPE_ETHERNET;
>+		default:
>+			return NM_LINK_TYPE_GENERIC;
>+		}

This means Wi-Fi devices will be NM_LINK_TYPE_ETHERNET, correct? That's not really obvious given the name. There's no docs on precisely what the NMLinkType values mean though, so it's hard to say if this is wrong or not.

>+#define NM_PLATFORM_IP4_ADDRESS_CHANGED "ip4-address-changed"
>+#define NM_PLATFORM_IP6_ADDRESS_CHANGED "ip6-address-changed"

Is it actually possible for those to be emitted? Aren't addresses only addable and removable? Likewise for ip4-route-changed and ip6-route-changed.

A bunch of route-related functions have incorrect alignment in the function header. Eg:

>+gboolean
>+nm_platform_ip6_route_exists (int ifindex,
>+		struct in6_addr network, int plen, struct in6_addr gateway, int metric)


>+link_supports_vlans (NMPlatform *platform, int ifindex)

VLAN is part of the 802.whatever protocol family, so for links representing actual hardware, only ARPHRD_ETHER ones support VLANs. In particular, InfiniBand links do not (and afaict, do not have NETIF_F_VLAN_CHALLENGED set).

Comment 8 Dan Winship 2013-04-04 17:30:12 UTC

I think I'd suggested this before, but it would be useful if the NMPlatform signals used the "detail" mechanism so that it was possible to receive signals only for a particular interface. eg, I could connect to "ip4-address-added::em1" if I only wanted to know about new addresses on that interface.

(I would use this in the NMDeviceGeneric code. For now, I can just connect to the signal generically and filter on ifindex within the signal handler, but it would be nice to not have to.)

Comment 9 Pavel Simerda 2013-04-04 19:11:27 UTC

(In reply to comment #7)
> "received" is misspelled as "recieved" in a bunch of places (both strings and
> symbol names)

Thanks, fixed.

> >+/* libnl inclues LINK_ATTR_FAMILY in oo_id_attrs of link_obj_ops and thus
> >+ * refuses to search for items that lack this attribute. I believe this is a
> >+ * bug or a bad design at the least.
>
> did you file a bug

There is no upstream libnl bugzilla to my knowledge. Could file with Fedora, though.

> / discuss this with Thomas?

Not yet.

> (If there's a bug filed
> somewhere, put a link to it in the comment so that later on when someone comes
> across this code, they can follow the link to see if the bug has been fixed
> upstream and the workaround can be removed.)
> 
> Likewise for the various other libnl helper methods.

They are all in one place and my plan was to discuss that now that Thomas doesn't have time to review the whole stuff.

> >+		switch (rtnl_link_get_arptype (rtnllink)) {
> >+		case ARPHRD_LOOPBACK:
> >+			return NM_LINK_TYPE_LOOPBACK;
> >+		case ARPHRD_ETHER:
> >+			return NM_LINK_TYPE_ETHERNET;
> >+		default:
> >+			return NM_LINK_TYPE_GENERIC;
> >+		}
> 
> This means Wi-Fi devices will be NM_LINK_TYPE_ETHERNET, correct?

Correct. Thanks for pointing this out. My proposal is to live with it until I add the NM_LINK_TYPE_WIFI type. Until then, nm-platform would not distinguish wireless ethernet from disconnected wired ethernet, leaving the wifi-specific features unsupported.

The wifi-specific features are now handled separately anyway through generic netlink sockets.

> That's not
> really obvious given the name. There's no docs on precisely what the NMLinkType
> values mean though, so it's hard to say if this is wrong or not.

It should represent nm-platform's view of the device type. I think that wifi is a different device type than ethernet. I believe the nm-platform device types should match the NMDevice types as much as possible.

Any suggestions?

> >+#define NM_PLATFORM_IP4_ADDRESS_CHANGED "ip4-address-changed"
> >+#define NM_PLATFORM_IP6_ADDRESS_CHANGED "ip6-address-changed"
> 
> Is it actually possible for those to be emitted? Aren't addresses only addable
> and removable? Likewise for ip4-route-changed and ip6-route-changed.

From the kernel? I didn't check that and the code is safe whether kernel always signals RTM_DELLINK and RTM_NEWLINK for changes, or even whether it doesn't signal the change at all.

From inside NetworkManager? Not yet. But we are going to support address/route updates in the future.

The most imporant example of address/route change is the update of the lifetimes that we are going to support in the future.

> A bunch of route-related functions have incorrect alignment in the function
> header. Eg:
> 
> >+gboolean
> >+nm_platform_ip6_route_exists (int ifindex,
> >+		struct in6_addr network, int plen, struct in6_addr gateway, int metric)

We spoke with Dan Williams in brno and he said that he's ok with a bit relaxed indentation/alignment rules for practical reasons. But if you insist on using precise space alignment everywhere, I can go through the patches and change it.

> >+link_supports_vlans (NMPlatform *platform, int ifindex)
> 
> VLAN is part of the 802.whatever protocol family, so for links representing
> actual hardware, only ARPHRD_ETHER ones support VLANs.

Thanks, fixed:

+       if (!rtnllink || rtnl_link_get_arptype (rtnllink) != ARPHRD_ETHER)
+               return FALSE;

> In particular,
> InfiniBand links do not (and afaict, do not have NETIF_F_VLAN_CHALLENGED set).

On the other hand, the loopback interface is ARPHRD_LOOPBACK and VLAN_CHALLANGED.

Thanks for your feedback.

Comment 10 Dan Williams 2013-04-04 19:40:18 UTC

(In reply to comment #8)
> I think I'd suggested this before, but it would be useful if the NMPlatform
> signals used the "detail" mechanism so that it was possible to receive signals
> only for a particular interface. eg, I could connect to
> "ip4-address-added::em1" if I only wanted to know about new addresses on that
> interface.
> 
> (I would use this in the NMDeviceGeneric code. For now, I can just connect to
> the signal generically and filter on ifindex within the signal handler, but it
> would be nice to not have to.)

Interfaces can get renamed by the user though, so we can't do this unless you want to have a bunch of code that listens for these signals also listen for the "interface-name-changed' signal and then do a bunch of g_signal_handler_disconnect/g_signal_connect stuff when that happens.

Comment 11 Dan Williams 2013-04-04 19:43:12 UTC

(In reply to comment #9)
> > A bunch of route-related functions have incorrect alignment in the function
> > header. Eg:
> > 
> > >+gboolean
> > >+nm_platform_ip6_route_exists (int ifindex,
> > >+		struct in6_addr network, int plen, struct in6_addr gateway, int metric)
> 
> We spoke with Dan Williams in brno and he said that he's ok with a bit relaxed
> indentation/alignment rules for practical reasons. But if you insist on using
> precise space alignment everywhere, I can go through the patches and change it.

I did pretty much say that, though I'd prefer to have the coding style the same as the rest of NM, I don't want to hold up the platform code for that right now.

BTW, I just got smacked down by Dave Miller for some kernel patch coding style stuff, and I was even matching some code style already in that module, but Dave wants everything in standard kernel style.  Which is fine, it's the kernel.  At some point I'll probably just go through and re-indent the platform code.

But we should try to figure out how to get an emacs auto-styling template or whatever it is into the source tree.

Comment 12 Pavel Simerda 2013-04-05 00:10:47 UTC

(In reply to comment #10)
> (In reply to comment #8)
> > I think I'd suggested this before, but it would be useful if the NMPlatform
> > signals used the "detail" mechanism so that it was possible to receive signals
> > only for a particular interface. eg, I could connect to
> > "ip4-address-added::em1" if I only wanted to know about new addresses on that
> > interface.

I was already thinking about it. But first of all I am trying to avoid using interface names for identification in favor of ifindex. So it would be more like ip4-address-added::26.

> > (I would use this in the NMDeviceGeneric code.

I still believe that NMDevice subclasses shouldn't interface with NMPlatform unless absolutely necessary. I already have code for NMDevice to do link events using nm-platform. I just haven't yet adapted it to your recent carrier detection changes.

Then the filtering would happen in NMDevice code itself. It is still possible to filter using the detail mechanism but I always viewed the detail as way to specialize a generic signal with a limited set of specializations.

My opinion is that ifindex is a data attribute and not a specialization. And it would require more bookkeeping on all sides.

> > For now, I can just connect to
> > the signal generically and filter on ifindex within the signal handler, but it
> > would be nice to not have to.)

I think that address and route keeping would benefit the same approach as link state keeping. NMDevice abstract class could be the central point connected to the type-specific classes via virtual methods.

> Interfaces can get renamed by the user though, so we can't do this unless you
> want to have a bunch of code that listens for these signals also listen for the
> "interface-name-changed' signal and then do a bunch of
> g_signal_handler_disconnect/g_signal_connect stuff when that happens.

Agreed. There might be similar cases for ifindex too, though. My impression is that the filtering in NMDevice abstract class is less fragile. But we can certainly revisit the idea later.

Comment 13 Pavel Simerda 2013-04-05 00:16:10 UTC

(In reply to comment #11)
> I did pretty much say that, though I'd prefer to have the coding style the same
> as the rest of NM, I don't want to hold up the platform code for that right
> now.

So... I propose to take your path and be tolerant for now.

> BTW, I just got smacked down by Dave Miller for some kernel patch coding style
> stuff, and I was even matching some code style already in that module, but Dave
> wants everything in standard kernel style.  Which is fine, it's the kernel.  At
> some point I'll probably just go through and re-indent the platform code.

That might be a good time for me to learn the coding style better looking at your patch.
 
> But we should try to figure out how to get an emacs auto-styling template or
> whatever it is into the source tree.

And I would be happy to get it working with vim also. Editor is not my only concern, I still don't know how I would allign some of the stuff without losing the advantage of fixed-width indentation to keep the code on the screen.

Comment 14 Dan Winship 2013-04-05 14:46:24 UTC

(In reply to comment #12)
> I still believe that NMDevice subclasses shouldn't interface with NMPlatform
> unless absolutely necessary.
...
> I think that address and route keeping would benefit the same approach as link
> state keeping. NMDevice abstract class could be the central point connected to
> the type-specific classes via virtual methods.

At the moment, no NMDevice class except NMDeviceGeneric cares about the platform's view of addresses/routes though. Although that will change in the future if we're tracking live reconfiguration I guess.


Another note: your route structs need to either include the value of rtnl_route_get_type(), or else nm_platform_ip*_route_get_all() needs to filter more based on that. In particular, NM will probably never care about RTN_UNREACHABLE routes, and I don't really care about RTN_BROADCAST routes for what I'm doing either.

("ip route list table all" will show the full route list, which corresponds to what nm_platform_ip*_route_get_all() returns now.)

Comment 15 Pavel Simerda 2013-04-09 14:52:19 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > I still believe that NMDevice subclasses shouldn't interface with NMPlatform
> > unless absolutely necessary.
> ...
> > I think that address and route keeping would benefit the same approach as link
> > state keeping. NMDevice abstract class could be the central point connected to
> > the type-specific classes via virtual methods.
> 
> At the moment, no NMDevice class except NMDeviceGeneric cares about the
> platform's view of addresses/routes though. Although that will change in the
> future if we're tracking live reconfiguration I guess.

They call nm-system functions of which some will be replaced by nm-platform functions and some will use nm-platform functions. Those would be of interest then.

But I don't think we need to care about it right now.

> Another note: your route structs need to either include the value of
> rtnl_route_get_type(), or else nm_platform_ip*_route_get_all() needs to filter
> more based on that. In particular, NM will probably never care about
> RTN_UNREACHABLE routes, and I don't really care about RTN_BROADCAST routes for
> what I'm doing either.

I think I'll go the "filtering" way. Thanks for pointing this out.

> ("ip route list table all" will show the full route list, which corresponds to
> what nm_platform_ip*_route_get_all() returns now.)

Comment 16 Pavel Simerda 2013-04-09 16:43:58 UTC

(In reply to comment #14)
> ("ip route list table all" will show the full route list, which corresponds to
> what nm_platform_ip*_route_get_all() returns now.)

Plus I found out another gotchas (writing down mostly for my own record):

1) The table should be filtered as well.

2) *_route_get_all should return an array *with* size (no zero termination needed), most probably a pointer to a GArray because 0.0.0.0/0 via 0.0.0.0 is a valid route. Same for other *_get_all for consistency.

3) tests/dump doesn't dump routes

Comment 17 Pavel Simerda 2013-04-10 11:50:51 UTC

(In reply to comment #16)
> (In reply to comment #14)
> > ("ip route list table all" will show the full route list, which corresponds to
> > what nm_platform_ip*_route_get_all() returns now.)

Fixed.

> 1) The table should be filtered as well.

Fixed.

> 2) *_route_get_all should return an array *with* size (no zero termination
> needed), most probably a pointer to a GArray because 0.0.0.0/0 via 0.0.0.0 is a
> valid route. Same for other *_get_all for consistency.

Fixed. GArray is now used for all *_get_all functions.

> 3) tests/dump doesn't dump routes

Fixed.

Comment 18 Pavel Simerda 2013-04-10 12:42:29 UTC

* Added log messages for ipv4 and ipv6 routes
* Removed nm_platform_route_flush which was not meant to deliver now (I spotted it thanks to 'make check-code-coveragre')

I'm going to push the branch. Will go through all the tests for all commits and through distcheck for the whole bunch before pushing.

Comment 19 Pavel Simerda 2013-04-10 14:42:30 UTC

* Replaced some assertions with safer error handling
* Added valgrind to EXTRA_DIST to fix 'make distcheck'

Comment 20 Pavel Simerda 2013-04-10 14:54:08 UTC

Pushed. Thank you. If you want to keep track of some of the minor issues, please start new bugreports for them.