After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 721771 - dhcp4 legacy static routes missing classful subnet masks
dhcp4 legacy static routes missing classful subnet masks
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
0.9.x
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2014-01-08 05:18 UTC by Scott Shambarger
Modified: 2014-01-23 16:50 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to correctly set subnet mask of classful static routes (1.09 KB, patch)
2014-01-08 05:36 UTC, Scott Shambarger
none Details | Review
Patch to correctly set subnet mask of classful static routes (1.09 KB, patch)
2014-01-08 06:23 UTC, Scott Shambarger
none Details | Review
Update patch to correctly detect host part for all classes, fix comment (1.73 KB, patch)
2014-01-08 23:21 UTC, Scott Shambarger
none Details | Review
test program for *_to_* functions (1.05 KB, text/plain)
2014-01-09 16:05 UTC, Pavel Simerda
  Details
Patch including updated mask calc and logging (1.78 KB, patch)
2014-01-09 23:05 UTC, Scott Shambarger
accepted-commit_now Details | Review

Description Scott Shambarger 2014-01-08 05:18:30 UTC
Currently, all dhcp4 static routes are added for target hosts only (mask plen 32), and not based on their address class.

The Introduction to RFC 3442 states:

The Static Routes option (option 33) does not provide a subnet mask
   for each route - it is assumed that the subnet mask is implicit in
   whatever network number is specified in each route entry.

This allows static routes for (example) 10.x.x.x addresses to be specified with option 33.

The ISC dhclient-script correctly sets the routes based on the address class for this option.  NetworkManager should as well.
Comment 1 Scott Shambarger 2014-01-08 05:36:01 UTC
Created attachment 265639 [details] [review]
Patch to correctly set subnet mask of classful static routes

Created a patch to correctly set the subnet mask for classful static routes.  The patch still uses host routing if the target address is not a network address (ie low byte != 0).
Comment 2 Scott Shambarger 2014-01-08 05:38:32 UTC
Note: I'm working away from home atm so can't fully test the patch (no access to my dhcp server).  Would be great if someone can actually test this... :)
Comment 3 Scott Shambarger 2014-01-08 06:23:55 UTC
Created attachment 265659 [details] [review]
Patch to correctly set subnet mask of classful static routes

Updated for fix rookie rushed coding error.

I've found a way to test this, and the code now works as expected.
Comment 4 Pavel Simerda 2014-01-08 09:40:54 UTC
Looks good except it special cases the C class but not B and A for host part bits. Also, the notes in RFC 3442 are not normative for this option which is defined in RFC 2132. Please use that one (or both) for reference.

http://tools.ietf.org/search/rfc2132
Comment 5 Scott Shambarger 2014-01-08 23:21:32 UTC
Created attachment 265800 [details] [review]
Update patch to correctly detect host part for all classes, fix comment

You are correct, I wasn't taking the entire host part into consideration, thanks.

I've updated the comment to refer to all the relevant RFCs as well.
Comment 6 Pavel Simerda 2014-01-09 11:43:21 UTC
(In reply to comment #5)
> Created an attachment (id=265800) [details] [review]
> Update patch to correctly detect host part for all classes, fix comment
> 
> You are correct, I wasn't taking the entire host part into consideration,
> thanks.

Better. But I would still like to check whether this is the correct behavior. Does the standard explicitly talk about host routes?

Examples (using private addresses):

  192.168.1.0 -> class C -> 192.168.1.0/24 (looks OK)
  192.168.0.0 -> class C -> 192.168.0.0/24 (looks OK)
  10.0.0.0 -> class A -> 10.0.0.0/8 (looks OK)
  10.1.0.0 -> class A -> host -> 10.1.0.0/32 (looks weird but probably correct)

Also, I'm still curious about your motivation to work on this. Are you actually using equipment that sends souch routes?

> I've updated the comment to refer to all the relevant RFCs as well.

Thanks.
Comment 7 Thomas Haller 2014-01-09 11:55:49 UTC
(In reply to comment #5)

Maybe you could use "~nm_utils_ip4_prefix_to_netmask (route.plen)" instead of "htonl(((guint32)0xFFFFFFFF) >> route.plen)" ?
Comment 8 Pavel Simerda 2014-01-09 12:01:37 UTC
(In reply to comment #7)
> (In reply to comment #5)
> 
> Maybe you could use "~nm_utils_ip4_prefix_to_netmask (route.plen)" instead of
> "htonl(((guint32)0xFFFFFFFF) >> route.plen)" ?

Or there could be nm_utils_ip4_prefix_to_hostmask(), for that matter.
Comment 9 Thomas Haller 2014-01-09 12:18:09 UTC
(In reply to comment #8)
> Or there could be nm_utils_ip4_prefix_to_hostmask(), for that matter.

Such a function does not (yet) exist. IMO it's much needed, because hostmask==~netmask -- and that is not going to change.
Comment 10 Pavel Simerda 2014-01-09 12:39:51 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Or there could be nm_utils_ip4_prefix_to_hostmask(), for that matter.
> 
> Such a function does not (yet) exist. IMO it's much needed, because
> hostmask==~netmask -- and that is not going to change.

I don't actually care about the way it's expressed. It might be best to just us macros for all of those instead of the current loop-based implementation.

#define nm_utils_ip4_prefix_to_hostmask htonl(0xFFFFFFFF >> route.plen)
#define nm_utils_ip4_prefix_to_netmask ~htonl(0xFFFFFFFF >> route.plen)
Comment 11 Pavel Simerda 2014-01-09 13:12:21 UTC
Just tested it, that doesn't work... sorry.
Comment 12 Pavel Simerda 2014-01-09 13:36:34 UTC
Correct implementation would be:

#define nm_utils_ip4_prefix_to_hostmask(prefix) \
    (prefix < 32 ? htonl(0xffffffff >> prefix) : 0)
#define nm_utils_ip4_prefix_to_netmask(prefix) \
    ~nm_utils_ip4_prefix_to_hostmask(prefix)

Unfortunately you cannot expect >>32 to work on 32bit values.
Comment 13 Scott Shambarger 2014-01-09 13:49:28 UTC
The nm_utils forms will make for cleaner code (although plen=32 isn't a factor as get_default_prefix doesn't return it).  Should I wait for prefix_to_hostmask to appear and re-submit?
Comment 14 Scott Shambarger 2014-01-09 13:53:06 UTC
(In reply to comment #6)
>   10.1.0.0 -> class A -> host -> 10.1.0.0/32 (looks weird but probably correct)

... should be ok based on how this option is expected to behave.

> Also, I'm still curious about your motivation to work on this. Are you actually
> using equipment that sends souch routes?

Hooked a machine up to an old dhcp server trying to debug another issue, and couldn't get to my desktop -- turned into this bug... :)
Comment 15 Pavel Simerda 2014-01-09 15:57:41 UTC
On IRC we realized we're talking about distinct locations with duplicate code:

./libnm-util/nm-utils.h:guint32 nm_utils_ip4_prefix_to_netmask (guint32 prefix);
./libnm-util/nm-utils.c:nm_utils_ip4_prefix_to_netmask (guint32 prefix)
./src/NetworkManagerUtils.c:nm_utils_ip4_prefix_to_netmask (guint32 prefix)

I assumed we're talking about a private API while thomas about a public API. The duplication should be removed.

It's safe to add nm_utils_ip4_prefix_to_hostmask() macro, so I'd go for it. On the other hand, it's not safe to replace a function with a macro (that would cause ABI breakage). I propose to just simplify the implementation using the new hostmask macro.

guint32
nm_utils_ip4_prefix_to_netmask (int prefix)
{
    return ~nm_utils_ip4_prefix_to_hostmask(prefix);
}

Or we could even have both of them so that newly built apps use the macro but the function is retained for backwards compatibility.

*.h:

#define nm_utils_ip4_prefix_to_netmask(prefix) ~nm_utils_ip4_prefix_to_hostmask(prefix)

*.c:

#undef nm_utils_ip4_prefix_to_netmask
guint nm_utils_ip4_prefix_to_netmask(int prefix);

guint
nm_utils_ip4_prefix_to_netmask(int prefix)
{
    return ~nm_utils_ip4_prefix_to_hostmask(prefix);
}

It's up to you.
Comment 16 Pavel Simerda 2014-01-09 16:00:58 UTC
(In reply to comment #14)
> Hooked a machine up to an old dhcp server trying to debug another issue, and
> couldn't get to my desktop -- turned into this bug... :)

Understood.

(In reply to comment #13)
> The nm_utils forms will make for cleaner code (although plen=32 isn't a factor
> as get_default_prefix doesn't return it).  Should I wait for prefix_to_hostmask
> to appear and re-submit?

Yes, please wait until thomas is done with the functions but I think he can adjust your patch and apply it, it's a trivial change, unless there's something else he wants to change.
Comment 17 Pavel Simerda 2014-01-09 16:05:06 UTC
Created attachment 265853 [details]
test program for *_to_* functions

While thinking about the *_to_* functions/macros, I created a simple test program with the two macros above. Plus I tried to reconstruct the netmask_to_prefix function (including the optional byte optimization used in the current code). Once I wrote it, there's no reason not to publish it for reference. I'm not saying the netmask_to_prefix implementation is better or worse than the current one, though.

Thomas, I also like your testing code and it seems to be similar to what I'm using in the test program.
Comment 18 Thomas Haller 2014-01-09 21:22:25 UTC
Talking to dcbw, we agreed not to add prefix_to_hostmask (for now).

Please just just ~prefix_to_netmask() instead.


and maybe?

-         nm_log_info (LOGD_DHCP, "  static route %s/%d gw %s", *s, route.plen, *(s + 1));
+         nm_log_info (LOGD_DHCP, "  static route %s", nm_platform_ip4_route_to_string (&route));



Thank you!! Thomas
Comment 19 Scott Shambarger 2014-01-09 23:05:26 UTC
Created attachment 265884 [details] [review]
Patch including updated mask calc and logging

OK, here's an updated patch including the bug in the commit log as well.

I'm guessing it's ok to use LOGD_DHCP here and not LOGD_IP?

As an aside, in the process of testing this I found a related issue... if the route gateway is invalid (unreachable), the connection goes into a reset loop (bouncing up and down with errors in the log).  My guess is that a bad (non-default) route probably shouldn't cause the connection fail outright, and should just be logged?  Would it be correct to log a new bug for that?
Comment 20 Thomas Haller 2014-01-10 00:42:57 UTC
(In reply to comment #19)

> As an aside, in the process of testing this I found a related issue... if the
> route gateway is invalid (unreachable), the connection goes into a reset loop
> (bouncing up and down with errors in the log).  My guess is that a bad
> (non-default) route probably shouldn't cause the connection fail outright, and
> should just be logged?  Would it be correct to log a new bug for that?

I think what you see is caused by http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=fbde824584266d9f6ac7f14916fdcc9cd179a73a .

We want to fail for user-configured routes and continue connecting for DHCP routes, but NM at that point does not know, where the route comes from. So, this is known, and cannot be fixed easily. There is already bug https://bugzilla.redhat.com/show_bug.cgi?id=1005416 for this.

Could this be the issue you're seeing?


Also, the bouncing up-and-down, should happen exactly 5 times. And then NM should wait for 5 minutes before trying 5 times again (if your connection is autoconnect=TRUE, otherwise no automatic retry). Does it loop for you indefinitely? That would be an unexpected bug.
Comment 21 Thomas Haller 2014-01-10 00:45:54 UTC
(In reply to comment #19)
> Created an attachment (id=265884) [details] [review]
> Patch including updated mask calc and logging
> 
> OK, here's an updated patch including the bug in the commit log as well.
> 
> I'm guessing it's ok to use LOGD_DHCP here and not LOGD_IP?

ACK, this looks good to me. Thank you!!
Comment 22 Scott Shambarger 2014-01-10 01:02:53 UTC
(In reply to comment #20)
> I think what you see is caused by
> http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=fbde824584266d9f6ac7f14916fdcc9cd179a73a

Looks like the probable cause.

> We want to fail for user-configured routes and continue connecting for DHCP
> routes, but NM at that point does not know, where the route comes from. So,
> this is known, and cannot be fixed easily. There is already bug
> https://bugzilla.redhat.com/show_bug.cgi?id=1005416 for this.

Well, I can't access the bug (restricted?), but I question if there's a need to fail the connection even if manually configured (non default) routes can't be added.  Some access is better than none, provided clear warnings are logged (if a route isn't active, the reason why can be determined).

initscripts has traditionally worked this way (see ifup-routes), and it's always been easy to figure out problems.  I'm in the (careful) process of moving my server from initscripts to NM, so I'm finding a few places where NM is more fragile than the classic system -- the benefits outweigh the costs though (automatic carrier tracking with hooks, etc).

Be nice for NM to follow it's mantra as closely as possible "networking that just works" :)

> Also, the bouncing up-and-down, should happen exactly 5 times. And then NM
> should wait for 5 minutes before trying 5 times again (if your connection is
> autoconnect=TRUE, otherwise no automatic retry). Does it loop for you
> indefinitely? That would be an unexpected bug.

autoconnect is true, I didn't test the server long enough to see if looped forever as I lost access (no console, it's remote), I just reverted the dhcpd config and it came back online (nice!).
Comment 23 Scott Shambarger 2014-01-10 01:47:34 UTC
Just thought of a couple of case studies (still relevant as the bug's about routing? :)

1) Novice at office, IT left your desktop with a broken static route for the DMZ.  You could google for a fix, but you can't get any access...

2) Expert with laptop in a cafe in Bratislava, DHCP has a bad routing config and you can't communicate the problem with the barista and they won't let you near the wifi router.  You can't work around the problem on the laptop with a manual config as the router only gives access after you perform the web form login. (It's likely Windows/Mac gracefully handle the bad route - I should test it - and so nobody's noticed until now).  No Skyping mom today.

In both cases, basic internet access "could" be had regardless of the bogus route.  Just thinking this might make life easier...
Comment 24 Thomas Haller 2014-01-10 02:00:54 UTC
(In reply to comment #23)

Based on the other bug report, it was decided, that it's better to fail in this
case. One problem is, that there is no mechanism to notify the UI clients of
the failure. And moreover it's not easy designing the UI to display the warning
in a useful way, without annoying the user. What if there are several clients?
Or none?

The connection attempt fails, because something in the configuration or network
environment is faulty. A user able to understand the logfile, can find the
error and fix the setup(1). At the very least, the user becomes aware that there
is a problem (sledgehammer like :) ).

(1) Ok, you cannot fix the DHCP setup of the cafe in Bratislava, but this will be fixed by having an origin-attribute for the route, so NM can fail only for
dynamic routes. I think dcbw is working on that.
Comment 25 Thomas Haller 2014-01-10 02:05:06 UTC
(In reply to comment #24)
> be fixed by having an origin-attribute for the route, so NM can fail only for
> dynamic routes. I think dcbw is working on that.

fail only for *manual* routes of course :)
Comment 26 Scott Shambarger 2014-01-10 02:07:44 UTC
If you're configuring local static routes, I agree the user should be capable of resolving the problem -- but until the origin-attribute is available wouldn't the conservative approach allow for connections to work when the configuration is out of your control?  Once the origin tracking is implemented, then make connections fail conditionally. After all, it may be months (several releases) before the origin-attribute is backported :)
Comment 27 Pavel Simerda 2014-01-10 11:23:34 UTC
I agree with Scott, here. Network connectivity is often so important that a conservative approach based on principle of least damage is better. It's rather easy to (1) log errors and (2) flag the failed routes so that UI can handle it accordingly.

If I understood correctly that a single failed route will bring down the whole configuration, in some cases I would even see this as a reason to use another solution including network configuration scripts instead of NetworkManager.
Comment 28 Dan Williams 2014-01-10 16:06:10 UTC
Review of attachment 265884 [details] [review]:

Also looks good & correct to me.
Comment 29 Thomas Haller 2014-01-21 20:12:18 UTC
Patch pushed to master as:

http://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=2b461bfa19f7862081522c64e90399dc1f30b59a

(I adjusted the commit message and white-space according our style)


Thanks for your contribution, Scott.


Closing this bug now, for discussion about failure on adding routes, there is downstream bug for RHEL7 (1005416) or we should open a new upstream bug here.
Comment 30 Scott Shambarger 2014-01-22 05:07:34 UTC
Thanks.  As must as I'd like to participate in the discussion on bug 1005416, I don't have access (is it possible for you to add me? I'm scott-redhat@ on their system).
Comment 31 Thomas Haller 2014-01-23 16:50:43 UTC
(In reply to comment #30)
> Thanks.  As must as I'd like to participate in the discussion on bug 1005416, I
> don't have access (is it possible for you to add me? I'm scott-redhat@ on their
> system).

Scott, I am unable to do that, sorry :)
But I created upstream bug 722843 with the very same content.