GNOME Bugzilla – Bug 685823
rhythmbox lyrics plugin: service Terra does not always return information
Last modified: 2018-05-24 17:42:56 UTC
In Lyrics plugin if you enable Terra service you will not able to get alwasy lyrics about song. This happens often when song_title or artist are not well defined. For istance: 'Metallica - One' is not defined due to the genirc title 'One'. More code is needed to get the right lyric page. See attachment that adds more code to the parser in order to get correct lyrics.
Created attachment 226134 [details] [review] Improve Terra Parser
Review of attachment 226134 [details] [review]: ::: plugins/lyrics/TerraParser.py @@ +69,3 @@ self.title = title + self.div_search = '<div id=[\'|"]letra|div_letra[\'|"]>' + self.div_accurate_search = '' you don't use this at all @@ +96,3 @@ callback (None, *data) + elif re.search(self.div_search, result): + if re.search('Confira abaixo a lista completa do artist', result): I don't like this much - can you see if there's a better way to identify this? Checking for exact string matches in the page text is not very reliable. @@ +102,3 @@ else: callback (None, *data) + return why did you add this? @@ +109,1 @@ + def get_moreaccurate_lyrics(self, source, callback, *data): this is a confusing name for this function @@ +136,3 @@ + if url_bestmatch: + print "best match: %s %s %s" % (smvalue_bestmatch, artist, url_bestmatch) + self.more_accurate = True should pass this as callback data instead of storing state in the parser object.
Review of attachment 226134 [details] [review]: ::: plugins/lyrics/TerraParser.py @@ +69,3 @@ self.title = title + self.div_search = '<div id=[\'|"]letra|div_letra[\'|"]>' + self.div_accurate_search = '' You're right, I forgot to delete this line: "self.div_accurate_search = ''" @@ +96,3 @@ callback (None, *data) + elif re.search(self.div_search, result): + if re.search('Confira abaixo a lista completa do artist', result): That's a way, checking on some div class... I will do that. @@ +102,3 @@ else: callback (None, *data) + return I thought was correct. Nevermind. I will remove it in the next patch's version. @@ +109,1 @@ + def get_moreaccurate_lyrics(self, source, callback, *data): Ok, I'll find a new name! @@ +136,3 @@ + if url_bestmatch: + print "best match: %s %s %s" % (smvalue_bestmatch, artist, url_bestmatch) + for link in links: Ok, I will figure out how to do that.
Ok, here an update. But I have not understood how to pass a variable with *data, keeping *data clean. So, I still use self.new_html as an internal state of parser.
Created attachment 226178 [details] [review] Improve Terra Parser [1]
Review of attachment 226178 [details] [review]: To pass the information you're currently storing in self.new_html through the callback data instead, just add it to the argument lists for loader.get_url and for the got_lyrics method, making sure to keep the order the same. I'd suggest adding it after the 'callback' argument. ::: plugins/lyrics/TerraParser.py @@ +90,3 @@ return if result is not None: I realise you're not adding this yourself, but this check seems silly when the first thing we do in this method is return if the result is None. @@ +93,1 @@ if re.search('Música não encontrada', result): This check is not working currently. The 'not found' response doesn't include the ú or ã entries any more. Can you see if there's a better way to identify this response than this string match? @@ +96,3 @@ + elif re.search(self.div_search, result): + if re.search("<div id='info'>", result): + self.get_lyrics_from_songs_list(result, callback, *data) Should make sure not to recurse here - if we've already tried a song list search, we shouldn't try it again. @@ +141,2 @@ def parse_lyrics(self, source): + source = source.decode('iso-8859-1').encode('UTF-8') This is wrong, I see UTF-8 encoded responses coming from the server. @@ +142,3 @@ + source = source.decode('iso-8859-1').encode('UTF-8') + # Replace some splecial char: + source = re.sub('\xc2\x92', "'", source) Why this character in particular?
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/rhythmbox/issues/1229.