After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 720760 - [review] dcbw/rdisc-fixes: fix issues with DNS and domain lifetime handling
[review] dcbw/rdisc-fixes: fix issues with DNS and domain lifetime handling
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: IP and DNS config
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-12-19 18:14 UTC by Dan Williams
Modified: 2013-12-20 00:03 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Dan Williams 2013-12-19 18:14:36 UTC
Downstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1044757.  DNS server and domain timestamps aren't updated when a new RA comes in.  This means that when they get to their half-life in clean_domains() and clean_servers(), an IPv6 Router Solicitation is triggered, which causes a new RA to be processed, but since the timestamp isn't updated, the solicit is triggered again, repeat.

This branch fixes up that issue and a few more things noticed from code inspection.
Comment 1 Pavel Simerda 2013-12-19 18:58:04 UTC
> rdisc: clear domain list when updating from RA

Looks good. The cumulative nature of the list is IMO handled in the rdisc code.

> trivial: show interface name in IPv6 rdisc log message
> rdisc: log interface name in some messages
> rdisc: fix copy & paste error in detecting RA changes

Sure.

> rdisc: fix possible overflow in large RA option lifetimes
>
> *nextevent = (guint32) expiry;

The typecast is redundant.

> guint32 never = G_MAXUINT32 / 2;

I know what's in the comment but... The patch is removing G_MAXINT32 and using half of G_MAXUINT32 instead. Does it there was indeed some problem with G_MAXINT32?

> rdisc: don't leak DNSSL domains if they aren't added
>
> +/* Always copies new->domain */

Not true. Only copies new->domain when new is not included already. Actually the right thing to do but the comment looks confusing. Makes sense to move the copying to the function that decides whether it's needed at all.

> rdisc: ensure RDNSS and DNSSL lifetimes are updated (rh #1044757)
>
> -add_server (NMRDisc *rdisc, const NMRDiscDNSServer *new)
> +add_dns_server (NMRDisc *rdisc, const NMRDiscDNSServer *new)

Also add_domain should be renamed to add_dns_domain to stay consistent.

The timestamp/lifetime change detection code makes sense.

> This obviously results in a new Router Advertisement, but since the
> timestamps don't get updated, clean_dns_servers() and clean_domains()
> simply request another solicitation because 'now' is still greater
> than half the old lifetime.  This casues another solicit request,
> which causes another RA, which... etc.

It sounds like your solution still relies on good data from the router advertisement server. We might want to make the code a bit more robust. For the beginning we might want to rate limit the outgoing router solicitation.

If we do that *before* fixing this bug, we can easily test that thanks to the bug and then be done with it.

> rdisc: don't add new RDNSS and DNSSL options with zero lifetime
>
> DNS servers and domains that are already known and become zero
> lifetime in the next RA are already correctly handled by
> clean_dns_servers() and clean_domains().

Not clear at the first glance as the important bit is that the lifetimes are still updated, we just don't add new zero lifetime records. Looks good.
Comment 2 Dan Williams 2013-12-19 20:43:09 UTC
(In reply to comment #1)
> > rdisc: clear domain list when updating from RA
> 
> Looks good. The cumulative nature of the list is IMO handled in the rdisc code.
> 
> > trivial: show interface name in IPv6 rdisc log message
> > rdisc: log interface name in some messages
> > rdisc: fix copy & paste error in detecting RA changes
> 
> Sure.
> 
> > rdisc: fix possible overflow in large RA option lifetimes
> >
> > *nextevent = (guint32) expiry;
> 
> The typecast is redundant.

Removed.

> > guint32 never = G_MAXUINT32 / 2;
> 
> I know what's in the comment but... The patch is removing G_MAXINT32 and using
> half of G_MAXUINT32 instead. Does it there was indeed some problem with
> G_MAXINT32?

The original comment was:

/* Use a magic date in distant enough future as there's no guint32 max macro. */

which was also misleading, as there actually is G_MAXUINT32.  However, that value isn't very useful as a the "magic future date", so I kept 0x7FFFFFFF.  G_MAXINT32 is typecast to a 'gint32' and then assigned to a guint32.  I was trying to clear up that possible error, but in practice it makes no difference.  I'll keep the original code but update the comment.

> > rdisc: don't leak DNSSL domains if they aren't added
> >
> > +/* Always copies new->domain */
> 
> Not true. Only copies new->domain when new is not included already. Actually
> the right thing to do but the comment looks confusing. Makes sense to move the
> copying to the function that decides whether it's needed at all.

Comment updated.

> > rdisc: ensure RDNSS and DNSSL lifetimes are updated (rh #1044757)
> >
> > -add_server (NMRDisc *rdisc, const NMRDiscDNSServer *new)
> > +add_dns_server (NMRDisc *rdisc, const NMRDiscDNSServer *new)
> 
> Also add_domain should be renamed to add_dns_domain to stay consistent.

Done.

> The timestamp/lifetime change detection code makes sense.
> 
> > This obviously results in a new Router Advertisement, but since the
> > timestamps don't get updated, clean_dns_servers() and clean_domains()
> > simply request another solicitation because 'now' is still greater
> > than half the old lifetime.  This casues another solicit request,
> > which causes another RA, which... etc.
> 
> It sounds like your solution still relies on good data from the router
> advertisement server. We might want to make the code a bit more robust. For the
> beginning we might want to rate limit the outgoing router solicitation.
> 
> If we do that *before* fixing this bug, we can easily test that thanks to the
> bug and then be done with it.

Actually, I don't think it can fail on bad DNSSL/RDNSS data anymore, since they both enforce a minimum lifetime of 7200 seconds.  The only way we could get another solicit() loop is if the router sent very, very small lifetimes, but that's prevented by the 7200 minimum.

However, I definitely agree that there should be solicit ratelimiting.  Do you mind if that patch is a later addition?  I fear it may be a bit tricky (and require a couple passes) to get the timeouts exactly correct and with the holidays coming up I'm not sure we'll get it done in a timely manner.

Should be easy enough to test though, by disabling the timestamp updating code.

> > rdisc: don't add new RDNSS and DNSSL options with zero lifetime
> >
> > DNS servers and domains that are already known and become zero
> > lifetime in the next RA are already correctly handled by
> > clean_dns_servers() and clean_domains().
> 
> Not clear at the first glance as the important bit is that the lifetimes are
> still updated, we just don't add new zero lifetime records. Looks good.

Thanks.
Comment 3 Pavel Simerda 2013-12-19 21:11:08 UTC
I like the current state of the patchset.
Comment 4 Dan Williams 2013-12-19 23:41:31 UTC
I've filed the enhancement for solicit ratelimiting as bug 720791.
Comment 5 Dan Williams 2013-12-20 00:03:35 UTC
Merged to master.