GNOME Bugzilla – Bug 567607
Need to regenerate Locations.xml.in from current observations
Last modified: 2009-03-06 20:27:04 UTC
qv https://bugzilla.redhat.com/show_bug.cgi?id=470099, some of the weather stations currently in Locations.xml.in are no longer reporting. For the handful of bugs that required regenerating Locations.xml.in this cycle, I rebuilt with OBSERVATIONS_URL=http://gnome.org/~danw/observations-2.24.txt, to keep the same set of locations as in 2.24. If we redo with the current set of observations, that will remove some locations and add others. It might be better to just edit the old observations file by hand to remove the now-bad ones.
This bug was reported on Launchpad bug #329881 Weather applet uses defunct station for Berlin weather When Berlin is selected as a location in Gnome's weather applet or clock applet, it hits the URL http://weather.noaa.gov/cgi-bin/mgetmetar.pl?cccc=EDDI which is the Tempelhof weather station. Since Tempelhof airport closed on Oct 30 last year, it's not very surprising that the last weather report for that station is on that same day. It was certainly a little surprising to see it reporting 4° C for the last month or so, day and night, even when it was snowing outside.
I installed the latest package from the Ubuntu Jaunty repository libgweather-common 2.25.5-0ubuntu1 libgweather1 2.25.5-0ubuntu1 Up until fairly recently, the weather applet showed three choices for Berlin - Tegel, Tempelhof, and Schoenefeld. Now it only shows one choice, and just happens to be using the defunct Tempelhof weather station. The Tegel and Schoenefeld stations should have been left alone and the Tempelhof station should have been removed. Instead the opposite happened. Tempelhof was left on the list and Tegel and Schoenefeld were deleted.
(In reply to comment #2) > Up until fairly recently, the weather applet showed three choices for Berlin - > Tegel, Tempelhof, and Schoenefeld. Now it only shows one choice, and just > happens to be using the defunct Tempelhof weather station. > > The Tegel and Schoenefeld stations should have been left alone and the > Tempelhof station should have been removed. Instead the opposite happened. > Tempelhof was left on the list and Tegel and Schoenefeld were deleted. No station was removed. They just don't appear in the UI by default since it was confusing for the user. (but yes, the one not working anymore should be removed, and this bug is about this)
I'm sorry but I just have to ask. What is confusing about having multiple options? The user probably know which airport/weather station that is closest to his/hers position. Gnome is taking it a BIT to far with the "user is a stupid idiot so remove all confusing options" thinking. I mean, The METAR station for Malmö (sturup) is about 25km from Malmö. It's not Malmö at all.. If you want a wheater station in Malmö try: http://weather.gladstonefamily.net/site/02635
Multiple-locations-per-city was removed primarily to lower the number of strings that translators need to translate. "Users don't always know which location is closest anyway" was just a nice side benefit. This is not about "users are stupid", it's about "weather station names are stupid". Yes, in some cases you have weather stations which are nicely named after suburbs of the city and so it's totally obvious where they are, but in many many other cases, they're named after microscopically small airports that even the residents of the city haven't heard of, or else they're just named "NWS Observation Station" or something, which no one is ever going to pick, even if it happens to be right in the center of town, and the airport with the recognizable name is 25 km away. The long-term fix is something like bug 562141, which it looks like there's interest in having someone do as a Summer of Code project.
to avoid string changes, I've created http://gnome.org/~danw/observations-2.26.txt, which contains the intersection of "stations that were reporting in the 2.24 timeframe" and "stations that are reporting now". (ie, it does not include any *new* stations relative to 2.24, it only removes dead ones). This causes us to drop 42 cities (none terribly dramatic) and then drop another 18 stations in cities that either already had another weather station, or else could switch to using another nearby one (including Berlin, which will now default to EDDT rather than EDDI).
More stats: $ wc -l observations-2.24.txt observations-2.26.txt 4538 observations-2.24.txt 4446 observations-2.26.txt IIUC, this file is needed to build Locations.xml.in, but it don't seem to be in http://svn.gnome.org/viewvc/libgweather/trunk/data/.
(In reply to comment #7) > IIUC, this file is needed to build Locations.xml.in, > but it don't seem to be in http://svn.gnome.org/viewvc/libgweather/trunk/data/. see data/sources/README.sources A new observations file is generated every day on gnome.org, containing only the weather stations that have reported at least once each of the last three days. But even that ends up not working very well, because sometimes a weather station just randomly stops reporting for a week and then starts again, or something like that. So if you use today's observations, it will remove some (mostly-working) stations and add others. So I made observations-2.24.txt so that we could regenerate it for 2.24.x releases without adding any new weather stations and breaking string freeze. In theory, for 2.26 we would have grabbed a snapshot sometime a few months ago and added some new stations then, but we didn't, because I only committed patches that *removed* locations during the 2.24->2.26 cycle.
(In reply to comment #8) > A new observations file is generated every day on gnome.org, containing only > the weather stations that have reported at least once each of the last three > days. But even that ends up not working very well, because sometimes a weather > station just randomly stops reporting for a week and then starts again, or > something like that. I have noticed that too while working on my own weather app. I can hear the airplanes flying, but the weather data is stale. "randomly": Maybe a statistic could be calculated for each station: Probability of a report in the previous N days This could be calculated from a database with two columns: station_id, report_datetime > So if you use today's observations, it will remove some > (mostly-working) stations and add others. "Mostly working" stations would have a high probability of reporting in the previous 3 days (say) ... A probability threshold (e.g. 90%) could be used to select stations. Not sure how long data would need to be collected ...
(In reply to comment #9) > "randomly": Maybe a statistic could be calculated for each station: > > Probability of a report in the previous N days > > This could be calculated from a database with two columns: > station_id, report_datetime The average interarrival time (time between reports) could be calculated from the database. A time interval threshold (e.g. 3 days) could be used to select "reliable" stations. IIRC, the "exponential distribution" can be used to model the times between random events: http://en.wikipedia.org/wiki/Exponential_distribution Disclaimer: IANAS (I am not a statistician) :-)
But you don't want the panel to be telling you what the weather was like 3 days ago. Even 12 hours ago is really too much, since the temperature isn't likely to be the same at midnight and noon. Ideally though, we'd want to be making the good/bad decision at runtime, not at build time.
(In reply to comment #11) > But you don't want the panel to be telling you what the weather was like 3 days > ago. Even 12 hours ago is really too much, since the temperature isn't likely > to be the same at midnight and noon. Yes. I completely agree. > Ideally though, we'd want to be making the good/bad decision at runtime, not at > build time. Maybe both. "Reliable" stations would be chosen at build time, stale observations would be identified at run time. The proposal above was addressing Comment #8 about the current system for identifying stations, which "ends up not working very well". I've added bug 574296 : weather apps do not clearly identify stale observations
Created attachment 130215 [details] perl script to create stations database from NWS metar directory The attached Perl script downloads the metar stations directory from the NWS at ftp://tgftp.nws.noaa.gov/data/observations/metar/stations/ and inserts stations with their timestamps into an Sqlite database. Usage: ./wx_get_stations.pl After running, the database will be in "wx_reports_db.sqlite". The database has two tables: "reports" and "log". The "reports" table records a station id and the the date/time of a report from the station. Timestamps are in UTC formatted as "YYYY-MM-DD HH:MM:SS". Stations with timestamps that have not changed since the previous run of the script are ignored. The log records completion date/time for each run of the script, the number of stations downloaded, the number of stations in the database at the end of the run, and the elapsed time in seconds (including download time, time to generate the SQL, and time to update the database). Several views are included. The most important one is v_stations_by_report_count. This view counts the number of times each station appears in the "reports" table and sorts the counts with the most frequent at the top. After repeatedly running the script over a period of time, v_stations_by_report_count will show which stations are most reliable about reporting relative to their peers. Example report generation: $ echo 'SELECT * FROM v_stations_by_report_count LIMIT 6;' | sqlite3 wx_reports_db.sqlite AGGH|2 BGUK|2 BIAR|2 BIEG|2 BIKF|2 BIRK|2 The script is taking about 24 secs. to run (1.1 Ghz processor) and the database is about 500 KB after two runs. Suggestions for improving either are solicited. Possibile enhancements include additional views and automatic removal of old reports.