After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 601554 - banshee spams w3c.org
banshee spams w3c.org
Status: RESOLVED FIXED
Product: banshee
Classification: Other
Component: general
1.5.1
Other Linux
: Normal critical
: 1.x
Assigned To: Banshee Maintainers
Banshee Maintainers
Depends on:
Blocks:
 
 
Reported: 2009-11-11 14:47 UTC by John Mylchreest
Modified: 2011-09-04 13:11 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description John Mylchreest 2009-11-11 14:47:47 UTC
After noticing that access to w3c.org was banned for abuse it was traced to my laptop, and after futher diagnostics it was found that banshee was the cause.

Banshee attempts to fetch /TR/xhtml1/DTD/xhtml1-transitional.dtd from w3c.org for some operations (I'm unaware which, or when, or how) and in this instance, it attempted to connect several hundreds times a second, and this occurred on multiple occasions.

banshee was running, but not playing any music at the time. It had been running for several weeks. The installed banshee version comes from the banshee-unstable ppa at http://ppa.launchpad.net/banshee-unstable-team/ppa/ubuntu for karmic.

Unfortunately, I'm not able to provide much diagnostic as the process was terminated once found.
Comment 1 John Mylchreest 2009-11-11 15:00:20 UTC
Apologies for the copy/paste (although sanitised) but this should prove useful?

I'm wondering if it might be caused by the podcast plugin (or similar). At the time I did have a network outage (caused by me moving my interface into a bridge, while network manager was still running - thinking it was available) so there is a chance it could be that but I can't say for sure since I looked in retrospect and timings might not coincide.

ie: if banshee believes network access is available, however it wouldn't route   anything anywhere and as a result it just backed up requests. It just so happened that it regained network access before the timeout occured? This makes a lot of sense, since the ubuntu UK LoCo podcast references a w3c.org DTD.
Comment 2 Gabriel Burt 2009-11-11 18:23:11 UTC
Very strange.  It's not intentional at all for Banshee to be accessing w3c.org.

What diagnostics did you do to determine it was Banshee issuing the requests?
Comment 3 John Mylchreest 2009-11-11 21:13:48 UTC
Initially, it was drawn to my attention through firewall logs (I was at work at the time) however to diagnose it I took the destination IP, setup a small watch using netstat -np && tcpdump to confirm the culprit and went from there.

After having played with this a little more, I b elieve (very strongly) it is caused by the podcast plugin and, in particular, the UK Ubuntu LoCo podcast which directly references the DTD that banshee was issuing the GET for.

At the time, NetworkManager was advertising as it was connected, but the interface that was marked up and working was actually unable to pass traffic. Speculating (without looking at the code at all!) it looks like the podcast plugin threaded the requests and when it was able to then route traffic correctly it seemed to attempt them all at once.
Comment 4 Bertrand Lorentz 2009-11-11 21:31:43 UTC
What is the URL of the podcast you are subscribed to ?
I didn't find any reference to the DTD you mentionned in the feeds I looked at in this page : 
http://podcast.ubuntu-uk.org/

The web page itself does reference this DTD, though...
Comment 5 Gabriel Burt 2009-11-11 22:18:09 UTC
Can you also provide the output of running this in a terminal:

sqlite3 ~/.config/banshee-1/banshee.db "select url from podcastsyndications"
Comment 6 John Mylchreest 2009-11-11 22:39:36 UTC
Indeed, I can do it tomorrow when in work. I'm subscribed to the same podcast at home and indeed, I see no purl.org redirect, or w3 link in the feed. I'm sure I saw this earlier. I'll confirm properly tomorrow.

Presumably, this also opens up a potential attack vector (should this be the actual scenario) via malformed podcast feeds?
Comment 7 John Mylchreest 2009-11-12 10:46:19 UTC
For some reason I see only the one podcast in my podcast subscriptions (through the actual banshee UI) but the query yielded the following:

# sqlite3 ~/.config/banshee-1/banshee.db "select url from podcastsyndications"
http://www.bbc.co.uk/radio/podcasts/moyles
http://www.talkshoe.com/tc/50667
http://feeds.feedburner.com/UbuntuUkPodcastOgg-high
http://subscribe.getmiro.com/?url1=http://feeds.feedburner.com/UbuntuUkPodcastOgg-high&trackback1=https://www.miroguide.com/feeds/5950/subscribe-hit&section1=video
http://podcast.ubuntu-uk.org

Once this is all confirmed I'll delete superfluous, however to confirm, when checking for updated podcasts:

..U..U..GET /radio/podcasts/moyles HTTP/1.1
..U..U..GET /podcasts/series/moyles HTTP/1.1
..U.=...GET /tc/50667 HTTP/1.1
..U%=..2GET /talkshoe/web/tcForward.jsp?masterId=50667&cmd=tcf HTTP/1.1
..U%..Z.GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1
..U8..\.GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1
..UA..\7GET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1
..UU..\dGET /TR/xhtml1/DTD/xhtml1-strict.dtd HTTP/1.1
..UWD(.CGET /UbuntuUkPodcastOgg-high HTTP/1.1
..Uwh...GET /?url1=http://feeds.feedburner.com/UbuntuUkPodcastOgg-high&trackback1=https://www.miroguide.com/feeds/5950/subscribe-hit&section1=video HTTP/1.0
..U.9.k.GET / HTTP/1.1
..U...\.GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1
..U...]UGET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1
..U...\.GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1
..U...].GET /TR/xhtml1/DTD/xhtml1-transitional.dtd HTTP/1.1

Regards
Comment 8 Bertrand Lorentz 2009-11-14 22:10:41 UTC
I've been able to reproduce this, after adding a podcast subscription to http://podcast.ubuntu-uk.org

This address is not a valid feed, it's a web page, so the parsing fails as expected. But my network capure shows 2 HTTP GET requests to www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Those queries don't have the Banshee user agent, so I think they aren't made directly by the podcast code. I think it might be the XML parser used in the RssParser class.
Comment 9 Bertrand Lorentz 2011-09-04 13:11:20 UTC
This problem has been fixed in the development version. The fix will be available in the next major software release. Thank you for your bug report.

I've just committed a fix to the RSS parser that prevents any external resource, like those DTDs, from being resolved :
http://git.gnome.org/browse/banshee/commit/?id=1ae49caa