After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 791846 - ipip6: no way to ignore encap limit
ipip6: no way to ignore encap limit
Status: RESOLVED FIXED
Product: NetworkManager
Classification: Platform
Component: general
git master
Other Linux
: Normal normal
: ---
Assigned To: NetworkManager maintainer(s)
NetworkManager maintainer(s)
Depends on:
Blocks: nm-review
 
 
Reported: 2017-12-21 14:34 UTC by Kenny
Modified: 2018-01-05 17:30 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] ip-tunnel: add support for tunnel flags (24.69 KB, patch)
2017-12-22 17:31 UTC, Beniamino Galvani
none Details | Review

Description Kenny 2017-12-21 14:34:57 UTC
In Linux there is a flag to ignore the encap limit for ip6 tunnels (ip6ip6 and ipip6). In order to convey the encap limit, Linux uses an IPv6 Destination Option (DSTOPT in Linux kernel) as defined in RFC 2473 (https://tools.ietf.org/html/rfc2473#page-13). Unfortunately there is some equipment that will reject your ip6 tunnel traffic if it has this DSTOPT. The one I'm concerned about is the NTT equipment used in all of Japan.

There is another benefit, though. If you tell Linux to ignore the encap limit and not send this DSTOPT, you get back 8 bytes of the MTU for data!

It appears that NetworkManager just has nowhere to set the netlink flags that can enable the encap limit ignore. You can definitely set it via netlink by using IFLA_IPTUN_FLAGS. Refer to:
http://elixir.free-electrons.com/linux/v4.9/source/net/ipv6/ip6_tunnel.c#L1895


If you're curious about how the encap limit ends up adding a DSTOPT and reducing the MTU by 8 bytes, see these parts of the kernel source:

1. If the flag IP6_TNL_F_IGN_ENCAP_LIMIT is not enabled, take off 8 bytes to fit the DSTOPT:
https://elixir.free-electrons.com/linux/v4.7/source/net/ipv6/ip6_tunnel.c#L1340

2. During transmission of ip6tnl packets, start with encap_limit set to -1:
https://elixir.free-electrons.com/linux/v4.7/source/net/ipv6/ip6_tunnel.c#L1160

3. If IP6_TNL_F_IGN_ENCAP_LIMIT is not set, then set the encap_limit to something >= 0:
https://elixir.free-electrons.com/linux/v4.7/source/net/ipv6/ip6_tunnel.c#L1173

4. Going into ip6_tnl_xmit, if encap_limit is >= 0 then reduce the MTU to fit the DSTOPT:
https://elixir.free-electrons.com/linux/v4.7/source/net/ipv6/ip6_tunnel.c#L1068

5. Finally add the DSTOPT if encap_limit is >= 0:
https://elixir.free-electrons.com/linux/v4.7/source/net/ipv6/ip6_tunnel.c#L1118
Comment 1 Beniamino Galvani 2017-12-21 15:45:53 UTC
It the ip-tunnel.encapsulation-limit property wasn't an uint with range 0-255 we could have used a -1 value to set the IGN_ENCAP_LIMIT flag. But the property type and range are now part of the API and can't be changed, so I think we should add a generic ip-tunnel.flags property, that can be used by other tunnel types as well.
Comment 2 Kenny 2017-12-22 00:46:51 UTC
That sounds reasonable given the constraints.

iproute2 CLI format is to use "encaplimit none" as the argument. Internally it sets encap_limit to 0 and then enables the IP6_TNL_F_IGN_ENCAP_LIMIT flag (see https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/tree/ip/link_ip6tnl.c?id=9135c4d6037ff9f1818507bac0049fc44db8c3d2#n209)

Example:

$ ip -6 tunnel add mode ipip6 remote 2001:db8:100::1 local 2001:db8:9000::1 dev enp1s0 encaplimit none
$ ip -6 tunnel
ip6tnl1: ip/ipv6 remote 2001:db8:100::1 local 2001:db8:9000::1 dev enp1s0 encaplimit none hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
Comment 3 Beniamino Galvani 2017-12-22 17:31:43 UTC
Created attachment 365887 [details] [review]
[PATCH] ip-tunnel: add support for tunnel flags
Comment 4 Thomas Haller 2018-01-03 11:39:59 UTC
(In reply to Beniamino Galvani from comment #3)
> Created attachment 365887 [details] [review] [review]
> [PATCH] ip-tunnel: add support for tunnel flags



+              .typ_flags =                  NM_META_PROPERTY_TYP_FLAG_ENUM_GET_PARSABLE_TEXT
+                                          | NM_META_PROPERTY_TYP_FLAG_ENUM_GET_PRETTY_TEXT,

Don't specify .typ_flags. These flags only exist to preserve the format of pre-existing values which use a different format. Newly added enums should all use the same format: the default format, which is considered the preferred one. You get that by not specifying .typ_flags.



+          g_param_spec_uint (NM_SETTING_IP_TUNNEL_FLAGS, "", "",
+                             0, G_MAXUINT, 0,

on D-Bus, the type is "u", which is 32 bit. G_MAXUINT32.

for that reason, make also and NM_SETTING_IP_TUNNEL_GET_PRIVATE (setting)->flags a guint32, because it is not possible to set it to anything else (in case guint != guint32).


+guint
+nm_setting_ip_tunnel_get_flags (NMSettingIPTunnel *setting)

the libnm API can and should use NMIPTunnelFlags.



+         flags &= ~(  NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT
+                    | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS
+                    | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FLOWLABEL
+                    | NM_IP_TUNNEL_FLAG_IP6_MIP6_DEV
+                    | NM_IP_TUNNEL_FLAG_IP6_RCV_DSCP_COPY
+                    | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FWMARK);

flags &= (guint32) (~(  NM_IP...));




+} ip6tnl_flags_map[] = {
+    { NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT,    IP6_TNL_F_IGN_ENCAP_LIMIT },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS,    IP6_TNL_F_USE_ORIG_TCLASS },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FLOWLABEL, IP6_TNL_F_USE_ORIG_FLOWLABEL },
+    { NM_IP_TUNNEL_FLAG_IP6_MIP6_DEV,           IP6_TNL_F_MIP6_DEV },
+    { NM_IP_TUNNEL_FLAG_IP6_RCV_DSCP_COPY,      IP6_TNL_F_RCV_DSCP_COPY },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FWMARK,    IP6_TNL_F_USE_ORIG_FWMARK },

do these defines all exist with older kernels? Just make sure that it builds on our TravisCI.



+    { NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT,    IP6_TNL_F_IGN_ENCAP_LIMIT },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS,    IP6_TNL_F_USE_ORIG_TCLASS },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FLOWLABEL, IP6_TNL_F_USE_ORIG_FLOWLABEL },
+    { NM_IP_TUNNEL_FLAG_IP6_MIP6_DEV,           IP6_TNL_F_MIP6_DEV },
+    { NM_IP_TUNNEL_FLAG_IP6_RCV_DSCP_COPY,      IP6_TNL_F_RCV_DSCP_COPY },
+    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FWMARK,    IP6_TNL_F_USE_ORIG_FWMARK },

why don't you make them numerically identical to the kernel flags? They are closely related.

If you make NMIPTunnelFlags numerically overlap with the guint32 flags from kernel, you can trivially do ip6tnl_flags_setting_to_plat().


     guint flow_label;
+    NMIPTunnelFlags flags;
 } NMPlatformLnkIp6Tnl;

I think the NMPlatform instances should be close to their kernel counterparts. Don't use NMIPTunnelFlags here, it's guint32. Otherwise, you needlessly drop flags that kernel exposes (even if we currently don't make use of them, it might be useful just to see them in the logfile).
This also means, in nm_platform_lnk_ip6tnl_to_string() you might not want to use "nm_utils_enum_to_str (nm_ip_tunnel_flags_get_type(), lnk->flags));" but create a explit platformflags2str (with NM_UTILS_FLAGS2STR_DEFINE_STATIC).


With the above (contrary to ip6tnl_flags_setting_to_plat()) ip6tnl_flags_plat_to_setting() is not trivial (because the platform flags might recognize more values). But turns out, you don't need it if you track flags in NMPlatformLnkIp6Tnl as their native guint32.



When adding a new property to the NMPlatform* structs, you *always* must extend the hash_update() and cmp() functions. Otherwise, the platform cache treats the property wrong.




+    obj_properties[PROP_FLAGS] =
+         g_param_spec_uint (NM_DEVICE_IP_TUNNEL_FLAGS, "", "",
+                            0, G_MAXUINT, 0,

G_MAXUINT32.


+#define NM_DEVICE_IP_TUNNEL_FLAGS               "flags"

libnm doesn't support it, is that intentional?


     guint32 flow_label;
+    NMIPTunnelFlags flags;
 } NMDeviceIPTunnelPrivate;

I think this should be again guint32. It's the same flags that come from the platform cache (from kernel), not the ones that are specified in the user's connection.

If you really want, then the getter

+    case PROP_FLAGS:
+         g_value_set_uint (value, priv->flags);
+         break;

could do (priv->flags & _NM_IP_TUNNEL_FLAG_ALL), but that seems like an artificial limitation. Why not just expose all kernel flags as guint32 as-is?
Comment 5 Beniamino Galvani 2018-01-04 15:01:49 UTC
(In reply to Thomas Haller from comment #4)

> +} ip6tnl_flags_map[] = {
> +    { NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT,    IP6_TNL_F_IGN_ENCAP_LIMIT },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS,    IP6_TNL_F_USE_ORIG_TCLASS },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FLOWLABEL,
> IP6_TNL_F_USE_ORIG_FLOWLABEL },
> +    { NM_IP_TUNNEL_FLAG_IP6_MIP6_DEV,           IP6_TNL_F_MIP6_DEV },
> +    { NM_IP_TUNNEL_FLAG_IP6_RCV_DSCP_COPY,      IP6_TNL_F_RCV_DSCP_COPY },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FWMARK,    IP6_TNL_F_USE_ORIG_FWMARK },
> 
> do these defines all exist with older kernels? Just make sure that it builds
> on our TravisCI.

Yes, they are all quite old.

> +    { NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT,    IP6_TNL_F_IGN_ENCAP_LIMIT },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS,    IP6_TNL_F_USE_ORIG_TCLASS },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FLOWLABEL,
> IP6_TNL_F_USE_ORIG_FLOWLABEL },
> +    { NM_IP_TUNNEL_FLAG_IP6_MIP6_DEV,           IP6_TNL_F_MIP6_DEV },
> +    { NM_IP_TUNNEL_FLAG_IP6_RCV_DSCP_COPY,      IP6_TNL_F_RCV_DSCP_COPY },
> +    { NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_FWMARK,    IP6_TNL_F_USE_ORIG_FWMARK },
> 
> why don't you make them numerically identical to the kernel flags? They are
> closely related.
> 
> If you make NMIPTunnelFlags numerically overlap with the guint32 flags from
> kernel, you can trivially do ip6tnl_flags_setting_to_plat().

The 'flags' property is generic for all tunnel types; at the moment
only IPv6 tunnel flags are implemented, but in the future other
non-IPv6 flags could be added as well (this is why all current enum
values have a "IP6" prefix). Kernel flags for other tunnel types may
clash with the existing ones and so it's not possible to define the
setting enums identical to kernel flags (if we want to keep the
mechanism generic).

>      guint flow_label;
> +    NMIPTunnelFlags flags;
>  } NMPlatformLnkIp6Tnl;
> 
> I think the NMPlatform instances should be close to their kernel
> counterparts. Don't use NMIPTunnelFlags here, it's guint32. Otherwise, you
> needlessly drop flags that kernel exposes (even if we currently don't make
> use of them, it might be useful just to see them in the logfile).
> This also means, in nm_platform_lnk_ip6tnl_to_string() you might not want to
> use "nm_utils_enum_to_str (nm_ip_tunnel_flags_get_type(), lnk->flags));" but
> create a explit platformflags2str (with NM_UTILS_FLAGS2STR_DEFINE_STATIC).

If flags are interpreted in the platform, other parts of code
(nm-device-ip-tunnel.c, tests) don't need to know how to translate
from kernel to setting values and vice versa, which seemed a better
approach to me. I implemented your suggestion in a fixup commit.


> With the above (contrary to ip6tnl_flags_setting_to_plat())
> ip6tnl_flags_plat_to_setting() is not trivial (because the platform flags
> might recognize more values). But turns out, you don't need it if you track
> flags in NMPlatformLnkIp6Tnl as their native guint32.


> +#define NM_DEVICE_IP_TUNNEL_FLAGS               "flags"
> 
> libnm doesn't support it, is that intentional?

No, fixed.


>      guint32 flow_label;
> +    NMIPTunnelFlags flags;
>  } NMDeviceIPTunnelPrivate;
> 
> I think this should be again guint32. It's the same flags that come from the
> platform cache (from kernel), not the ones that are specified in the user's
> connection.
> 
> If you really want, then the getter
> 
> +    case PROP_FLAGS:
> +         g_value_set_uint (value, priv->flags);
> +         break;
> 
> could do (priv->flags & _NM_IP_TUNNEL_FLAG_ALL), but that seems like an
> artificial limitation. Why not just expose all kernel flags as guint32 as-is?

I think that device and setting should use the same values to avoid potential confusion (as stated above, kernel and setting values can be different in principle).

Fixed the rest and pushed branch bg/ip-tunnel-flags-bgo791846.
Comment 6 Thomas Haller 2018-01-04 17:34:02 UTC
In verify(),
+    flags = priv->flags;
+    if (family == AF_INET6) {
+         flags &= (guint32) ~(  NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT
+                              | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS

this allows the flags for all IPv6 types, but they are only supported for NM_IP_TUNNEL_MODE_IPIP6 and NM_IP_TUNNEL_MODE_IP6IP6.

> If flags are interpreted in the platform, other parts of code
> (nm-device-ip-tunnel.c, tests) don't need to know how to translate
> from kernel to setting values and vice versa, which seemed a better
> approach to me. I implemented your suggestion in a fixup commit.

As you say, the new flags depend on the tunnel type. So, future NMIPTunnelFlags possibly make no sense on a NMPlatformLnkIp6Tnl. For that reason, I like the fixup better. Also, nm_platform_lnk_ip6tnl_to_string() prints all flags, not only the supported ones.


+ props->flags = nla_get_u32 (tb[IFLA_IPTUN_FLAGS]) & 0xFFFF; /* ignore kernel internal flags */

why hiding them? Do they hurt?




+          g_param_spec_uint (NM_SETTING_IP_TUNNEL_FLAGS, "", "",
+                             0, G_MAXUINT32, 0,
+                             G_PARAM_READWRITE |
+                             G_PARAM_CONSTRUCT |

just to initialize with 0, you don't need G_PARAM_CONSTRUCT.




I agree that it makes sense that NM_SETTING_IP_TUNNEL_FLAGS and NM_DEVICE_IP_TUNNEL_FLAGS abstract the flags, as they are applicable to tunnel types. Just like we don't have distinct NMSettingIPTunnelIP6IP6, NMDeviceIPTunnelIP6IP6 types.

I agree, there is no particularly strong reason hence to have the flags NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT, IP6_TNL_F_IGN_ENCAP_LIMIT numerically identical. But you already made them the same. Pushed a fixup to make use of that fact (which isn't ever going to change, because these numeric values are stable API).




Regardless that NMIPTunnelFlags is an abstraction, I dislike a bit that the same enum type is used both for NM_SETTING_IP_TUNNEL_FLAGS and NM_DEVICE_IP_TUNNEL_FLAGS. The former is ~what should be configured~, the latter is ~what is configured~. There is an overlap, but their meaning in the future could be different.
E.g. for route flags, there is NM_IP_ROUTE_ATTRIBUTE_ONLINK which corresponds to RTNH_F_ONLINK. But RTNH_F_CLONED makes no sense ever on the configuration side.
Or, there is NM_METERED_GUESS_YES which is a valid property at runtime on the device. But it makes no sense to set it in the configuration.

I wish there would be a NMSettingIPTunnelFlags, NMDeviceIPTunnelFlags, and NMPlatformLnkIp6Tnl.flags (which for now all have identical values, but in future may not).

But ok, I really don't mind :) Fine as-is.
Comment 7 Beniamino Galvani 2018-01-05 14:01:58 UTC
(In reply to Thomas Haller from comment #6)
> In verify(),
> +    flags = priv->flags;
> +    if (family == AF_INET6) {
> +         flags &= (guint32) ~(  NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT
> +                              | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS
> 
> this allows the flags for all IPv6 types, but they are only supported for
> NM_IP_TUNNEL_MODE_IPIP6 and NM_IP_TUNNEL_MODE_IP6IP6.

At the moment the only ip6tnl types supported by NM are IPIP6 and IP6IP6. But those flags apply also to other ip6tnl types and so I don't think we should restrict to IPIP6 and IP6IP6 here in verify().
 
> + props->flags = nla_get_u32 (tb[IFLA_IPTUN_FLAGS]) & 0xFFFF; /* ignore
> kernel internal flags */
> 
> why hiding them? Do they hurt?

Yes, when we add a new link with certain flags, the actual flags differs because there are high-order bits set as:

include/net/ip6_tunnel.h:#define IP6_TNL_F_CAP_XMIT 0x10000
include/net/ip6_tunnel.h:#define IP6_TNL_F_CAP_RCV 0x20000
include/net/ip6_tunnel.h:#define IP6_TNL_F_CAP_PER_PACKET 0x40000

and this breaks platform test. Note that those defines are not available to userspace as they are not in uapi headers.

Okay, it's also easy to fix the test to ignore those bits.

> +          g_param_spec_uint (NM_SETTING_IP_TUNNEL_FLAGS, "", "",
> +                             0, G_MAXUINT32, 0,
> +                             G_PARAM_READWRITE |
> +                             G_PARAM_CONSTRUCT |
> 
> just to initialize with 0, you don't need G_PARAM_CONSTRUCT.

Fixed.

> I agree, there is no particularly strong reason hence to have the flags NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT, IP6_TNL_F_IGN_ENCAP_LIMIT numerically identical. But you already made them the same. Pushed a fixup to make use of that fact (which isn't ever going to change, because these numeric values are stable API).

Squashed, thanks.
Comment 8 Thomas Haller 2018-01-05 14:32:38 UTC
(In reply to Beniamino Galvani from comment #7)
> (In reply to Thomas Haller from comment #6)
> > In verify(),
> > +    flags = priv->flags;
> > +    if (family == AF_INET6) {
> > +         flags &= (guint32) ~(  NM_IP_TUNNEL_FLAG_IP6_IGN_ENCAP_LIMIT
> > +                              | NM_IP_TUNNEL_FLAG_IP6_USE_ORIG_TCLASS
> > 
> > this allows the flags for all IPv6 types, but they are only supported for
> > NM_IP_TUNNEL_MODE_IPIP6 and NM_IP_TUNNEL_MODE_IP6IP6.
> 
> At the moment the only ip6tnl types supported by NM are IPIP6 and IP6IP6.
> But those flags apply also to other ip6tnl types and so I don't think we
> should restrict to IPIP6 and IP6IP6 here in verify().

I really think we should. In the future relaxing a condition in verify() is perfectly backward compatible.

Unless NM support it, it makes no sense to configure it or to allow such configuration. It's a non-working configuration that should be rejected, for example by nmcli.


> and this breaks platform test. Note that those defines are not available to
> userspace as they are not in uapi headers.

In my opinion, once it's exposed by kernel on netlink, it becomes userspace API.  Regardless whether there is a #define for that number. If it's not intended for userspace, then kernel should mask the flags away.

> Okay, it's also easy to fix the test to ignore those bits.

I would suggest that. The test should follow reality, not the other way around.