GNOME Bugzilla – Bug 525094
Grab lyrics from darklyrics.com
Last modified: 2010-05-29 06:25:11 UTC
In addition to the sites from which rhythmbox grab lyrics would be nice if rhythmbox could grab lyrics from www.darklyrics.com. A site containing high quality metal music lyrics. I have a patch that I'll put here.
Created attachment 108262 [details] [review] Patch to grab lyrics from darklyrics.com It only adds the engine. dosen't add the item in the plugin configuration dialog, so in order to use it, it is necesary to use gconf to add darklyrics.com to /apps/rhythmbox/plugins/lyrics/engines key.
This looks OK to me, with the usual reservations about html-scraping; do you know if the site owners are generally OK with this sort of usage? # Fix artist name first with unicodedata #unichr(int(decomposition(u"ÿ").split()[0], 16)) Did this turn out to be unnecessary? I'm not sure if you were involved in the recent relicensing of rhythmbox to allow GPL-incompatible gstreamer plugins, so I'll ask here just to be sure: do you agree to relicense this code?
The webmaster of darklyrics.com haven't answered my two mails about the topic (one was sent when you asked me the cuestion, the second one week ago). I don't know how to take that. The line of codes that are commented are an example of how to handle the international chars changin them to their ascii 127 version like changing ÿ to y. This is needed to the patch to be complete, but I didn't wanted to do it when I upload this patch. And I agree in that relicensing for this or any contribution that I could make to rhythmbox.
I think I'm OK with adding this without explicit consent as long as it's not enabled by default, especially if it's not even visible in the plugin preferences. It'd be nice if there was a way for the site owner to identify the source of the requests, but I don't see a good way to do that. + pattern = '<a href="../lyrics/%s/(.*).html#([^"]+)" target="_blank"><FONT COLOR="#CCCCCC">%s</FONT></a><br>' \ + % (self.wartist, re.escape(self.title.title())) + matches = re.findall (pattern, songlist) Instead of this, it might be better to find all the song links, extract the titles, and use the string_match utility function I added a few days ago. This will match even when there are punctuation and spelling differences. + titleline = '(?ms)<a name=%s><FONT color=#DDDDDD><b>%s. %s</b></font>(.+?)<BR>\r\n<BR>' % \ + (self.titlenumber, self.titlenumber, re.escape(self.title.title())) + lyricmatch = re.split (titleline, album) and if you do that, this should use the title that was matched in the previous step, rather than the input title.
Created attachment 162256 [details] [review] New DarkLyrics Grabber patch I did what Jonathan Matthew recommended. Regards
Pushed as commit 06d9966. Thanks for working on this.