GNOME Bugzilla – Bug 733851
same USN on several interfaces is a problem for ResourceBrowser
Last modified: 2019-02-22 09:29:47 UTC
Even after fixing bug 733651, all GUPnP apps show devices disappearing and appearing several times during the first seconds after startup. This happens when those devices advertise over several interfaces. When the same USN is available on several interfaces, the same ResourceBrowser is used for both interfaces and messages arriving on both of them are handled. So: 1. there are quite a few resource-available messages 2. the location in the message changes randomly between the two urls Bug 724030 added simulated byebyes when gssdp thinks we've missed the real byebyes. This creates an additional problem: 3. a message from the other interface leads to a resource-unavailable signal because the code assumes we have missed a bye-bye (because the location does not match) I don't know if we can fix 1 & 2 in any way... but for 3: should the location comparison allow the ip address to change? or is that just adding another workaround on top of a workaround?
yikes.
If you revert 9c499f0dfa30c0c97e9a8eb7a94b28e8e2fce5c6, does that make things better?
But I thin in essence we need more multi-homing tests :-/
Yes, I can see issues 1 and 2 before that commit but less so. Before 9c499f0dfa I get these resource-available signals from ResourceBrowser for a specific device on startup, with two actual networks and loopback: Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://127.0.0.1:42460/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml After that commit it looks like this: Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://127.0.0.1:42460/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://10.10.15.141:59135/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://10.237.72.61:50252/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml Resource unavailable USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice Resource available USN: uuid:0787b6df-1318-45a9-b1fd-8bc4b0c7a244::upnp:rootdevice LOC: http://192.168.100.1:47137/0787b6df-1318-45a9-b1fd-8bc4b0c7a244.xml So the problem was already there, but 9c499f0dfa makes it a bit more visible and quite a bit more annoying (with the added "unavailable" signals). How should ResourceBrowser behave with multiple interfaces? I'm assuming this is not covered by any of the specs, and we'll just have to figure out something. I like the fact that I don't get three instances of rygel in my controlpoints but is there any way to make this work in a seamless way?
DLNA does kind of talk about it: * If the control point sees IP address 1 and sees IP address 2 after 10 seconds, then the control point can conclude that the UPnP device has IP address 2 as the more reliable IP destination. * If the control point sees IP address 1 and sees IP address 2 within 10 seconds, then the control point can conclude that the UPnP device has two IP destinations that seem equally reliable. Same idea as guidelines: 7.3.2.27.4 Upon receiving multiple advertisements for the same UPnP device UDN, a UPnP control point should select the vendor-defined preferred advertisement as the route to the device. 7.3.2.27.5 When a UPnP control point gets an advertisement for a UPnP device UDN on a different IP address from the one it has previously selected, it may continue to use its selected IP address provided that it has received an advertisement on the selected IP address in the last 10 seconds. Otherwise, if the UPnP control point does not receive an advertisement for its selected IP address in the next 10 seconds, it may change its selection to the new IP address. Even if the control point keeps the selected IP address in this case, it should change its selection to the new IP address when an access to the selected IP address fails.
I remember multihoming being part of DLNA testsuite as well, IIRC as an option in the device profile.
Wait, that looks weird. With Rygel bound to lo,tun0 and wlan0, gssdp-discover explicitly bound to lo, I get: [rygel] jens@laptop-jge: ~/Source/gupnp/tests (master) $ gssdp-discover -i lo -t upnp:rootdevice Using network interface lo Scanning for resources matching upnp:rootdevice resource available USN: uuid:bd681d4d-4a62-4ef1-9b86-72f2382fa982::upnp:rootdevice Location: http://127.0.0.1:59830/bd681d4d-4a62-4ef1-9b86-72f2382fa982.xml resource available USN: uuid:8c812557-c668-40fb-8a94-a27132510a23::upnp:rootdevice Location: http://127.0.0.1:59830/8c812557-c668-40fb-8a94-a27132510a23.xml resource available USN: uuid:a8ec76de-b5e0-4464-abd4-56304f3426d3::upnp:rootdevice Location: http://127.0.0.1:59830/a8ec76de-b5e0-4464-abd4-56304f3426d3.xml Are you sure that you have a libgssdp with bug 733651 fixed? When I revert your #include <config.h> fix , I get the output from comment 4, a resource available message for every device Rygel is listening on, because the filtering doesn't work.
(In reply to comment #7) > Wait, that looks weird. With Rygel bound to lo,tun0 and wlan0, gssdp-discover > explicitly bound to lo, I get: My output was from printf-debugging with gupnp-universal-cp not bound to a specific interface, should have mentioned it. So the multiple "available" messages are probably actually correct (just a result of multiple ResourceBrowsers). The problem is just that the location changes even with the 733651 fix. > Are you sure that you have a libgssdp with bug 733651 fixed? The control messages are now there and lots of packets are now being discarded because of the filter so I would say that one is fixed on my machine. I started looking at the messages and packet infos a little closer, and found out the problem only exists on loopback. it looks like rygel is advertizing devices that are on another network on the lo interface. Here's a packet that leads to a location change and "unavailable/available" signals. There are several similar ones, with different locations. I don't see this happening on other interfaces. > PACKET INFO COMPARISON: > client iface: 1 127.0.0.1 > new packet iface: 1 127.0.0.1 > PACKET LOCATION HEADER: > Location: http://10.10.15.141:56666/efc337a8-d833-4320-ab42-2c951d39582c.xml Is there some sensible explanation to this?
can you try to revert 3030bf4cb97db33fcdfe7e69706fdb0f19b0f446? Maybe the multicast interface binding is broken
I decided to try with a new rygel first -- I've been testing against my 'production' mediaserver for now -- and it just keeps getting more complex :( So with: * production rygel (0.22.2, old gssdp) * gupnp-universal-cp master (new gssdp) I see some advertisements on the wrong interface (in other words the location advertised on an interface changes over time). Then I built rygel and tried again. * rygel master (new gssdp) * gupnp-universal-cp master (new gssdp) Now gupnp-universal-cp no longer sees rygel on my wireless interface at all: I think the PKTINFO filtering might be going wrong: there's a lot of packets being discarded on both sides...
(In reply to comment #10) > Then I built rygel and tried again. > * rygel master (new gssdp) > * gupnp-universal-cp master (new gssdp) > > Now gupnp-universal-cp no longer sees rygel on my wireless interface at all: I > think the PKTINFO filtering might be going wrong: there's a lot of packets > being discarded on both sides... In this case the replies to the M-SEARCH are dropped because the interface index does not match the index in the client. This is what rygel replies look like on the controlpoint (rygel running on local machine but this is the reply over wlan0): client index 3, ip 10.10.15.141 new packet index 1, ip 10.10.15.141 They all seem to get dropped because the if index does not match.
Checking with wireshark shows that when gupnp-universal-cp sends M-SEARCH over "wlan0", rygel (which is bound to "wlan0" only) does not answer on that interface: it sends the responses over "lo". These responses are of course discarded by the new interface filter...
Created attachment 282003 [details] [review] Accept messages from other networks on loopback Kernel does smart things and routes traffic sent to other networks through loopback. We need to accept messages like that.
I'm still testing that patch but I think that's what we have to do: kernel does do smart things with messages sent through other networks and actually delivers them through loopback when that makes sense (this is easy to test even with a webserver on local machine and wireshark). I'm not very familiar with the original purpose of the packet filter patch: I hope it's not lost by now...
Oh, and hard coding the if_index: I believe loopback devices have had index 1 for a long time, but I can't really be 100% sure... It has definitely been a #define in the kernel for a couple of years.
That must have been there before, even with the old filtering
Created attachment 282100 [details] [review] Use IN_PKTINFO to signalize source interface Signed-off-by: Jens Georg <mail@jensge.org>
Does this work for you as well? I cannot test since I can't reproduce the issue.
(In reply to comment #18) > Does this work for you as well? I cannot test since I can't reproduce the > issue. If you mean "do packets really go over the interface they were sent on", then no: I still see all replies from rygel going over loopback. > I cannot test since I can't reproduce the issue. Oh interesting. So when rygel (or any process) is sending packets to it's own ip address on some real interface, they don't show up on loopback if you check with wireshark? Obviously NOTIFYs and M-SEARCHes do end up on the real interface because of multicast, but no other packets seem to go through the real interface here. It seems to make sense as an optimization: this way kernel never has to go through hardware at all.
Right, I can see it as well.
The interweb suggests that this is related to the local routing table so we can't do much abou this in code :-/ I don't like that work-around as it causes the message to originate from the wrong client, but it seems that's the only way to go.
(In reply to comment #21) > I don't like that work-around as it causes the message to originate from > the wrong client, but it seems that's the only way to go. Could you explain that in a bit more detail? I think I've not totally understood what the problem is.
Well, the m-search response for interface wlan0 comes through the client/resource browser on interface lo, doesn't it? Which will probably work out in the end, but still feels odd.
Also: What happens if we're not bound to lo?
Sorry, got it. The message IS arriving on the correct interface, but PKTINFO tells us it comes from index 1. I thought it was coming on the client bound to "lo".
Attachment 282003 [details] pushed as dd001ff - Accept messages from other networks on loopback