GNOME Bugzilla – Bug 331691
Podcast feeds don't handle timezones in <pubDate> tag
Last modified: 2006-02-21 10:04:43 UTC
That bug has been opened on https://launchpad.net/distros/ubuntu/+source/rhythmbox/+bug/29631 "Rhythmbox doesn't recognize the date in the podcasts. It doesn't recognize the PubDate tag. I use the last dapper cvs release 20060124. I experience the problem with the following feed: http://internetradio.vrt.be/podcast/StuBru/rss-41_spod.xml"
I can confirm this problem. It uses a bogus date of 1969-12-31. The reason is that the <pubDate> tag uses a timezone which the parser doesn't recognize. I'm attaching a patch that *should* fix it, but doesn't quite.
Created attachment 59662 [details] [review] Non-working patch This patch doesn't quite work. It has the right date format according to the manual for strptime(), but it doesn't seem to recognize the timezone "%Z" part.
Retitling to reflect underlying bug.
That is because "CET" isn't a timezone. Timezones are like "+1100" or "-0730".
CET is a timezone name, +1100 are offsets from GMT, and according to strptime() offsets are handled by %z (lowercase). It does say that timezone names (%Z) aren't properly handled, but it does say that are consumed (presumably being read, but not set in the tm structure), so it's weird.
From: info strptime `%z' The offset from GMT in ISO 8601/RFC822 format. `%Z' The timezone name. _Note:_ Currently, this is not fully implemented. The format is recognized, input is consumed but no field in TM is set.
From the ISO time standard: "There exists no international standard that specifies abbreviations for civil time zones like CET, EST, etc. and sometimes the same abbreviation is even used for two very different time zones. In addition, politicians enjoy modifying the rules for civil time zones, especially for daylight saving times, every few years, so the only really reliable way of describing a local time zone is to specify numerically the difference of local time to UTC." The CET timezone could mean different things depending on what country you are in, so isn't really a valid timezone specifier.
I understand (and the manual implies), but I'm just saying that from the description for the library function, it implies that it is expecting a *name* (i.e. text) for the "%Z" option (%z is used for the GMT offset) and will parse it as free text, later on in the info manual it mentions "EDT" explicitly. So I would expect it to consume text with no whitespace, but not necessarily *use* it (which we don't need anyway).
(In reply to comment #8) > So I would expect it to consume text with no whitespace, but not necessarily > *use* it (which we don't need anyway). It says of %Z: "The format is recognized, input is consumed". Clearly it just doesn't work, so we need another way to parse this.
Created attachment 59829 [details] [review] Updated patch to get around lousy timezone handing in strptime Updated to patch to get around handling of timezone names. This will match all the characters up to the timezone name, then look for a timezone which is a non-zero-length collection of capitalised alphabetical characters like CET, AEST, then check the remainder as a year. Not so pretty but it works for this file and have checked it against made-up <pubDate> tags. e.g.: Fri Feb 17 16:34:06 CET 2006 will succeed but both: Fri Feb 17 16:00:00 1999 aaa Fri Feb 10 18:00:00 cet 2006 will fail.
Created attachment 59831 [details] [review] Better patch which also handles timezone offsets Better patch which also handles timezone offsets like "-1100" and "-0730" as per comment #4.
Looks good, and works fine for me. Committed to cvs, thanks.