GNOME Bugzilla – Bug 601554
banshee spams w3c.org
Last modified: 2011-09-04 13:11:20 UTC
After noticing that access to w3c.org was banned for abuse it was traced to my laptop, and after futher diagnostics it was found that banshee was the cause. Banshee attempts to fetch /TR/xhtml1/DTD/xhtml1-transitional.dtd from w3c.org for some operations (I'm unaware which, or when, or how) and in this instance, it attempted to connect several hundreds times a second, and this occurred on multiple occasions. banshee was running, but not playing any music at the time. It had been running for several weeks. The installed banshee version comes from the banshee-unstable ppa at http://ppa.launchpad.net/banshee-unstable-team/ppa/ubuntu for karmic. Unfortunately, I'm not able to provide much diagnostic as the process was terminated once found.
Apologies for the copy/paste (although sanitised) but this should prove useful? I'm wondering if it might be caused by the podcast plugin (or similar). At the time I did have a network outage (caused by me moving my interface into a bridge, while network manager was still running - thinking it was available) so there is a chance it could be that but I can't say for sure since I looked in retrospect and timings might not coincide. ie: if banshee believes network access is available, however it wouldn't route anything anywhere and as a result it just backed up requests. It just so happened that it regained network access before the timeout occured? This makes a lot of sense, since the ubuntu UK LoCo podcast references a w3c.org DTD.
Very strange. It's not intentional at all for Banshee to be accessing w3c.org. What diagnostics did you do to determine it was Banshee issuing the requests?
Initially, it was drawn to my attention through firewall logs (I was at work at the time) however to diagnose it I took the destination IP, setup a small watch using netstat -np && tcpdump to confirm the culprit and went from there. After having played with this a little more, I b elieve (very strongly) it is caused by the podcast plugin and, in particular, the UK Ubuntu LoCo podcast which directly references the DTD that banshee was issuing the GET for. At the time, NetworkManager was advertising as it was connected, but the interface that was marked up and working was actually unable to pass traffic. Speculating (without looking at the code at all!) it looks like the podcast plugin threaded the requests and when it was able to then route traffic correctly it seemed to attempt them all at once.
What is the URL of the podcast you are subscribed to ? I didn't find any reference to the DTD you mentionned in the feeds I looked at in this page : http://podcast.ubuntu-uk.org/ The web page itself does reference this DTD, though...
Can you also provide the output of running this in a terminal: sqlite3 ~/.config/banshee-1/banshee.db "select url from podcastsyndications"
Indeed, I can do it tomorrow when in work. I'm subscribed to the same podcast at home and indeed, I see no purl.org redirect, or w3 link in the feed. I'm sure I saw this earlier. I'll confirm properly tomorrow. Presumably, this also opens up a potential attack vector (should this be the actual scenario) via malformed podcast feeds?
For some reason I see only the one podcast in my podcast subscriptions (through the actual banshee UI) but the query yielded the following: # sqlite3 ~/.config/banshee-1/banshee.db "select url from podcastsyndications" http://www.bbc.co.uk/radio/podcasts/moyles http://www.talkshoe.com/tc/50667 http://feeds.feedburner.com/UbuntuUkPodcastOgg-high http://subscribe.getmiro.com/?url1=http://feeds.feedburner.com/UbuntuUkPodcastOgg-high&trackback1=https://www.miroguide.com/feeds/5950/subscribe-hit§ion1=video http://podcast.ubuntu-uk.org Once this is all confirmed I'll delete superfluous, however to confirm, when checking for updated podcasts: ..U..U..GET /radio/podcasts/moyles HTTP/1.1 ..U..U..GET /podcasts/series/moyles HTTP/1.1 ..U.=...GET /tc/50667 HTTP/1.1 ..U%=..2GET /talkshoe/web/tcForward.jsp?masterId=50667&cmd=tcf HTTP/1.1 ..U%..Z.GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1 ..U8..\.GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1 ..UA..\7GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1 ..UU..\dGET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1 ..UWD(.CGET /UbuntuUkPodcastOgg-high HTTP/1.1 ..Uwh...GET /?url1=http://feeds.feedburner.com/UbuntuUkPodcastOgg-high&trackback1=https://www.miroguide.com/feeds/5950/subscribe-hit§ion1=video HTTP/1.0 ..U.9.k.GET / HTTP/1.1 ..U...\.GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1 ..U...]UGET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1 ..U...\.GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1 ..U...].GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1 Regards
I've been able to reproduce this, after adding a podcast subscription to http://podcast.ubuntu-uk.org This address is not a valid feed, it's a web page, so the parsing fails as expected. But my network capure shows 2 HTTP GET requests to www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd Those queries don't have the Banshee user agent, so I think they aren't made directly by the podcast code. I think it might be the XML parser used in the RssParser class.
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report. I've just committed a fix to the RSS parser that prevents any external resource, like those DTDs, from being resolved : http://git.gnome.org/browse/banshee/commit/?id=1ae49caa