Bug 733851 – same USN on several interfaces is a problem for ResourceBrowser

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 733851 - same USN on several interfaces is a problem for ResourceBrowser


Summary:	same USN on several interfaces is a problem for ResourceBrowser


Status:	RESOLVED FIXED

Product:	gssdp
Classification:	Other
Component:	General
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	GUPnP Maintainers
QA Contact:	GUPnP Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2014-07-28 09:10 UTC by Jussi Kukkonen
Modified:	2019-02-22 09:29 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Accept messages from other networks on loopback (2.51 KB, patch) 2014-07-30 08:38 UTC, Jussi Kukkonen	committed	Details \| Review
Use IN_PKTINFO to signalize source interface (3.71 KB, patch) 2014-07-30 18:56 UTC, Jens Georg	none	Details \| Review

Description Jussi Kukkonen 2014-07-28 09:10:12 UTC

Even after fixing bug 733651, all GUPnP apps show devices disappearing and appearing several times during the first seconds after startup. This happens when those devices advertise over several interfaces.

When the same USN is available on several interfaces, the same ResourceBrowser is used for both interfaces and messages arriving on both of them are handled. So:
 1. there are quite a few resource-available messages
 2. the location in the message changes randomly between the two urls

Bug 724030 added simulated byebyes when gssdp thinks we've missed the real byebyes. This creates an additional problem:
 3. a message from the other interface leads to a resource-unavailable
    signal because the code assumes we have missed a bye-bye (because
    the location does not match)


I don't know if we can fix 1 & 2 in any way... but for 3: should the location comparison allow the ip address to change? or is that just adding another workaround on top of a workaround?

Comment 1 Jens Georg 2014-07-28 09:47:31 UTC

yikes.

Comment 2 Jens Georg 2014-07-28 09:51:23 UTC

If you revert 9c499f0dfa30c0c97e9a8eb7a94b28e8e2fce5c6, does that make things better?

Comment 3 Jens Georg 2014-07-28 10:17:39 UTC

But I thin in essence we need more multi-homing tests :-/

Comment 4 Jussi Kukkonen 2014-07-28 14:52:22 UTC

Yes, I can see issues 1 and 2 before that commit but less so. Before 9c499f0dfa I get these resource-available signals from ResourceBrowser for a specific device on startup, with two actual networks and loopback:

Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://127.0.0.1:42460/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml


After that commit it looks like this:

Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://127.0.0.1:42460/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://10.10.15.141:59135/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml
Resource unavailable
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
Resource available
  USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice
  LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml


So the problem was already there, but 9c499f0dfa makes it a bit more visible and quite a bit more annoying (with the added "unavailable" signals).


How should ResourceBrowser behave with multiple interfaces? I'm assuming this is not covered by any of the specs, and we'll just have to figure out something. I like the fact that I don't get three instances of rygel in my controlpoints but is there any way to make this work in a seamless way?

Comment 5 Jussi Kukkonen 2014-07-28 15:15:40 UTC

DLNA does kind of talk about it:

* If the control point sees IP address 1 and sees IP address 2 after 10 seconds, then the control point can conclude that the UPnP device has IP address 2 as the more reliable IP destination.
* If the control point sees IP address 1 and sees IP address 2 within 10 seconds, then the control point can conclude that the UPnP device has two IP destinations that seem equally reliable.


Same idea as guidelines:

7.3.2.27.4 Upon receiving multiple advertisements for the same UPnP device UDN, a UPnP control point should select the vendor-defined preferred advertisement as the route to the device.

7.3.2.27.5 When a UPnP control point gets an advertisement for a UPnP device UDN on a different IP address from the one it has previously selected, it may continue to use its selected IP address provided that it has received an advertisement on the selected IP address in the last 10 seconds.
Otherwise, if the UPnP control point does not receive an advertisement for its selected IP address in the next 10 seconds, it may change its selection to the new IP address. Even if the control point keeps the selected IP address in this case, it should change its selection to the new IP address when an access to the selected IP address fails.

Comment 6 Jens Georg 2014-07-28 19:10:03 UTC

I remember multihoming being part of DLNA testsuite as well, IIRC as an option in the device profile.

Comment 7 Jens Georg 2014-07-29 01:46:52 UTC

Wait, that looks weird. With Rygel bound to lo,tun0 and wlan0, gssdp-discover explicitly bound to lo, I get:

[rygel] jens@laptop-jge: ~/Source/gupnp/tests (master) $ gssdp-discover -i lo -t upnp:rootdevice
Using network interface lo
Scanning for resources matching upnp:rootdevice
resource available
  USN:      uuid:bd681d4d-4a62-4ef1-9b86-72f2382fa982::upnp:rootdevice
  Location: http://127.0.0.1:59830/bd681d4d-4a62-4ef1-9b86-72f2382fa982.xml
resource available
  USN:      uuid:8c812557-c668-40fb-8a94-a27132510a23::upnp:rootdevice
  Location: http://127.0.0.1:59830/8c812557-c668-40fb-8a94-a27132510a23.xml
resource available
  USN:      uuid:a8ec76de-b5e0-4464-abd4-56304f3426d3::upnp:rootdevice
  Location: http://127.0.0.1:59830/a8ec76de-b5e0-4464-abd4-56304f3426d3.xml

Are you sure that you have a libgssdp with bug 733651 fixed?

When I revert your #include <config.h> fix , I get the output from comment 4, a resource available message for every device Rygel is listening on, because the filtering doesn't work.

Comment 8 Jussi Kukkonen 2014-07-29 08:59:09 UTC

(In reply to comment #7)
> Wait, that looks weird. With Rygel bound to lo,tun0 and wlan0, gssdp-discover
> explicitly bound to lo, I get:

My output was from printf-debugging with gupnp-universal-cp not bound to a specific interface, should have mentioned it. So the multiple "available" messages are probably actually correct (just a result of multiple ResourceBrowsers). The problem is just that the location changes even with the 733651 fix.

> Are you sure that you have a libgssdp with bug 733651 fixed?

The control messages are now there and lots of packets are now being discarded because of the filter so I would say that one is fixed on my machine.


I started looking at the messages and packet infos a little closer, and found out the problem only exists on loopback. it looks like rygel is advertizing devices that are on another network on the lo interface.

Here's a packet that leads to a location change and "unavailable/available" signals. There are several similar ones, with different locations. I don't see this happening on other interfaces.

> PACKET INFO COMPARISON:
>   client iface:     1 127.0.0.1
>   new packet iface: 1 127.0.0.1
> PACKET LOCATION HEADER:
>   Location: http://10.10.15.141:56666/efc337a8-d833-4320-ab42-2c951d39582c.xml

Is there some sensible explanation to this?

Comment 9 Jens Georg 2014-07-29 09:38:49 UTC

can you try to revert 3030bf4cb97db33fcdfe7e69706fdb0f19b0f446? Maybe the multicast interface binding is broken

Comment 10 Jussi Kukkonen 2014-07-29 12:16:43 UTC

I decided to try with a new rygel first -- I've been testing against my 'production' mediaserver for now -- and it just keeps getting more complex :(

So with:
 * production rygel (0.22.2, old gssdp)
 * gupnp-universal-cp master (new gssdp)
I see some advertisements on the wrong interface (in other words the location advertised on an interface changes over time).


Then I built rygel and tried again.
 * rygel master (new gssdp)
 * gupnp-universal-cp master (new gssdp)

Now gupnp-universal-cp  no longer sees rygel on my wireless interface at all: I think the PKTINFO filtering might be going wrong: there's a lot of packets being discarded on both sides...

Comment 11 Jussi Kukkonen 2014-07-29 12:38:55 UTC

(In reply to comment #10)
> Then I built rygel and tried again.
>  * rygel master (new gssdp)
>  * gupnp-universal-cp master (new gssdp)
> 
> Now gupnp-universal-cp  no longer sees rygel on my wireless interface at all: I
> think the PKTINFO filtering might be going wrong: there's a lot of packets
> being discarded on both sides...

In this case the replies to the M-SEARCH are dropped because the interface index does not match the index in the client. This is what rygel replies look like on the controlpoint (rygel running on local machine but this is the reply over wlan0):

  client     index 3, ip 10.10.15.141
  new packet index 1, ip 10.10.15.141

They all seem to get dropped because the if index does not match.

Comment 12 Jussi Kukkonen 2014-07-29 14:59:50 UTC

Checking with wireshark shows that when gupnp-universal-cp sends M-SEARCH over "wlan0", rygel (which is bound to "wlan0" only) does not answer on that interface: it sends the responses over "lo". These responses are of course discarded by the new interface  filter...

Comment 13 Jussi Kukkonen 2014-07-30 08:38:38 UTC

Created attachment 282003 [details] [review]
Accept messages from other networks on loopback

Kernel does smart things and routes traffic sent to other networks
through loopback. We need to accept messages like that.

Comment 14 Jussi Kukkonen 2014-07-30 08:42:03 UTC

I'm still testing that patch but I think that's what we have to do: kernel does do smart things with messages sent through other networks and actually delivers them through loopback when that makes sense (this is easy to test even with a webserver on local machine and wireshark).

I'm not very familiar with the original purpose of the packet filter patch: I hope it's not lost by now...

Comment 15 Jussi Kukkonen 2014-07-30 08:48:58 UTC

Oh, and hard coding the if_index: I believe loopback devices have had index 1 for a long time, but I can't really be 100% sure... It has definitely been a #define in the kernel for a couple of years.

Comment 16 Jens Georg 2014-07-30 18:06:21 UTC

That must have been there before, even with the old filtering

Comment 17 Jens Georg 2014-07-30 18:56:19 UTC

Created attachment 282100 [details] [review]
Use IN_PKTINFO to signalize source interface

Signed-off-by: Jens Georg <mail@jensge.org>

Comment 18 Jens Georg 2014-07-30 18:57:20 UTC

Does this work for you as well? I cannot test since I can't reproduce the issue.

Comment 19 Jussi Kukkonen 2014-07-31 14:14:33 UTC

(In reply to comment #18)
> Does this work for you as well? I cannot test since I can't reproduce the
> issue.

If you mean "do packets really go over the interface they were sent on", then no: I still see all replies from rygel going over loopback.


> I cannot test since I can't reproduce the issue.

Oh interesting. So when rygel (or any process) is sending packets to it's own ip address on some real interface, they don't show up on loopback if you check with wireshark? 

Obviously NOTIFYs and M-SEARCHes do end up on the real interface because of multicast, but no other packets seem to go through the real interface here. It seems to make  sense as an optimization: this way kernel never has to go through hardware at all.

Comment 20 Jens Georg 2014-08-02 08:48:22 UTC

Right, I can see it as well.

Comment 21 Jens Georg 2014-08-02 18:23:19 UTC

The interweb suggests that this is related to the local routing table so we can't do much abou this in code :-/ I don't like that work-around as it causes the message to originate from the wrong client, but it seems that's the only way to go.

Comment 22 Jussi Kukkonen 2014-08-03 10:06:37 UTC

(In reply to comment #21)
> I don't like that work-around as it causes the message to originate from
> the wrong client, but it seems that's the only way to go.

Could you explain that in a bit more detail? I think I've not totally understood what the problem is.

Comment 23 Jens Georg 2014-08-05 21:29:18 UTC

Well, the m-search response for interface wlan0 comes through the client/resource browser on interface lo, doesn't it?

Which will probably work out in the end, but still feels odd.

Comment 24 Jens Georg 2014-08-05 21:30:14 UTC

Also: What happens if we're not bound to lo?

Comment 25 Jens Georg 2014-08-05 23:00:26 UTC

Sorry, got it. The message IS arriving on the correct interface, but PKTINFO tells us it comes from index 1. I thought it was coming on the client bound to "lo".

Comment 26 Jens Georg 2014-08-05 23:13:04 UTC

Attachment 282003 [details] pushed as dd001ff - Accept messages from other networks on loopback