After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 567607 - Need to regenerate Locations.xml.in from current observations
Need to regenerate Locations.xml.in from current observations
Status: RESOLVED FIXED
Product: libgweather
Classification: Core
Component: locations
unspecified
Other Linux
: Normal normal
: 2.22.0
Assigned To: libgweather-maint
libgweather-maint
Depends on:
Blocks:
 
 
Reported: 2009-01-13 14:50 UTC by Dan Winship
Modified: 2009-03-06 20:27 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
perl script to create stations database from NWS metar directory (4.02 KB, text/plain)
2009-03-06 20:27 UTC, Steve Tyler
Details

Description Dan Winship 2009-01-13 14:50:10 UTC
qv https://bugzilla.redhat.com/show_bug.cgi?id=470099, some of the weather stations currently in Locations.xml.in are no longer reporting.

For the handful of bugs that required regenerating Locations.xml.in this cycle, I rebuilt with OBSERVATIONS_URL=http://gnome.org/~danw/observations-2.24.txt, to keep the same set of locations as in 2.24. If we redo with the current set of observations, that will remove some locations and add others. It might be better to just edit the old observations file by hand to remove the now-bad ones.
Comment 1 duanedesign 2009-02-16 03:01:05 UTC
This bug was reported on Launchpad bug #329881

Weather applet uses defunct station for Berlin weather

When Berlin is selected as a location in Gnome's weather applet or clock applet, it hits the URL http://weather.noaa.gov/cgi-bin/mgetmetar.pl?cccc=EDDI which is the Tempelhof weather station. Since Tempelhof airport closed on Oct 30 last year, it's not very surprising that the last weather report for that station is on that same day. It was certainly a little surprising to see it reporting 4° C for the last month or so, day and night, even when it was snowing outside.
Comment 2 duanedesign 2009-02-16 15:06:24 UTC
I installed the latest package from the Ubuntu Jaunty repository

libgweather-common 2.25.5-0ubuntu1
libgweather1 2.25.5-0ubuntu1

Up until fairly recently, the weather applet showed three choices for Berlin - Tegel, Tempelhof, and Schoenefeld. Now it only shows one choice, and just happens to be using the defunct Tempelhof weather station.

The Tegel and Schoenefeld stations should have been left alone and the Tempelhof station should have been removed. Instead the opposite happened. Tempelhof was left on the list and Tegel and Schoenefeld were deleted.
Comment 3 Vincent Untz 2009-02-16 16:20:25 UTC
(In reply to comment #2)
> Up until fairly recently, the weather applet showed three choices for Berlin -
> Tegel, Tempelhof, and Schoenefeld. Now it only shows one choice, and just
> happens to be using the defunct Tempelhof weather station.
> 
> The Tegel and Schoenefeld stations should have been left alone and the
> Tempelhof station should have been removed. Instead the opposite happened.
> Tempelhof was left on the list and Tegel and Schoenefeld were deleted.

No station was removed. They just don't appear in the UI by default since it was confusing for the user.

(but yes, the one not working anymore should be removed, and this bug is about this)

Comment 4 Mårten Woxberg 2009-03-01 19:09:17 UTC
I'm sorry but I just have to ask.

What is confusing about having multiple options?
The user probably know which airport/weather station that is closest to his/hers position.
Gnome is taking it a BIT to far with the "user is a stupid idiot so remove all confusing options" thinking.


I mean, The METAR station for Malmö (sturup) is about 25km from Malmö.
It's not Malmö at all..

If you want a wheater station in Malmö try:
http://weather.gladstonefamily.net/site/02635
Comment 5 Dan Winship 2009-03-01 20:49:54 UTC
Multiple-locations-per-city was removed primarily to lower the number of strings that translators need to translate. "Users don't always know which location is closest anyway" was just a nice side benefit. This is not about "users are stupid", it's about "weather station names are stupid". Yes, in some cases you have weather stations which are nicely named after suburbs of the city and so it's totally obvious where they are, but in many many other cases, they're named after microscopically small airports that even the residents of the city haven't heard of, or else they're just named "NWS Observation Station" or something, which no one is ever going to pick, even if it happens to be right in the center of town, and the airport with the recognizable name is 25 km away.

The long-term fix is something like bug 562141, which it looks like there's interest in having someone do as a Summer of Code project.
Comment 6 Dan Winship 2009-03-03 17:48:03 UTC
to avoid string changes, I've created http://gnome.org/~danw/observations-2.26.txt, which contains the intersection of "stations that were reporting in the 2.24 timeframe" and "stations that are reporting now". (ie, it does not include any *new* stations relative to 2.24, it only removes dead ones). This causes us to drop 42 cities (none terribly dramatic) and then drop another 18 stations in cities that either already had another weather station, or else could switch to using another nearby one (including Berlin, which will now default to EDDT rather than EDDI).
Comment 7 Steve Tyler 2009-03-04 21:09:26 UTC
More stats:

$ wc -l observations-2.24.txt observations-2.26.txt
 4538 observations-2.24.txt
 4446 observations-2.26.txt

IIUC, this file is needed to build Locations.xml.in,
but it don't seem to be in http://svn.gnome.org/viewvc/libgweather/trunk/data/.
Comment 8 Dan Winship 2009-03-04 21:34:13 UTC
(In reply to comment #7)
> IIUC, this file is needed to build Locations.xml.in,
> but it don't seem to be in http://svn.gnome.org/viewvc/libgweather/trunk/data/.

see data/sources/README.sources

A new observations file is generated every day on gnome.org, containing only the weather stations that have reported at least once each of the last three days. But even that ends up not working very well, because sometimes a weather station just randomly stops reporting for a week and then starts again, or something like that. So if you use today's observations, it will remove some (mostly-working) stations and add others. So I made observations-2.24.txt so that we could regenerate it for 2.24.x releases without adding any new weather stations and breaking string freeze.

In theory, for 2.26 we would have grabbed a snapshot sometime a few months ago and added some new stations then, but we didn't, because I only committed patches that *removed* locations during the 2.24->2.26 cycle.
Comment 9 Steve Tyler 2009-03-05 11:14:58 UTC
(In reply to comment #8)

> A new observations file is generated every day on gnome.org, containing only
> the weather stations that have reported at least once each of the last three
> days. But even that ends up not working very well, because sometimes a weather
> station just randomly stops reporting for a week and then starts again, or
> something like that.

I have noticed that too while working on my own weather app. I can hear the airplanes flying, but the weather data is stale.

"randomly": Maybe a statistic could be calculated for each station:

Probability of a report in the previous N days

This could be calculated from a database with two columns:
station_id, report_datetime

> So if you use today's observations, it will remove some
> (mostly-working) stations and add others.

"Mostly working" stations would have a high probability of reporting in the previous 3 days (say) ...

A probability threshold (e.g. 90%) could be used to select stations.

Not sure how long data would need to be collected ...
Comment 10 Steve Tyler 2009-03-05 13:18:44 UTC
(In reply to comment #9)

> "randomly": Maybe a statistic could be calculated for each station:
> 
> Probability of a report in the previous N days
> 
> This could be calculated from a database with two columns:
> station_id, report_datetime

The average interarrival time (time between reports) could be calculated from the database. A time interval threshold (e.g. 3 days) could be used to select "reliable" stations.

IIRC, the "exponential distribution" can be used to model the times between random events:
http://en.wikipedia.org/wiki/Exponential_distribution

Disclaimer: IANAS (I am not a statistician) :-)
Comment 11 Dan Winship 2009-03-05 14:33:08 UTC
But you don't want the panel to be telling you what the weather was like 3 days ago. Even 12 hours ago is really too much, since the temperature isn't likely to be the same at midnight and noon.

Ideally though, we'd want to be making the good/bad decision at runtime, not at build time.
Comment 12 Steve Tyler 2009-03-05 18:52:48 UTC
(In reply to comment #11)
> But you don't want the panel to be telling you what the weather was like 3 days
> ago. Even 12 hours ago is really too much, since the temperature isn't likely
> to be the same at midnight and noon.

Yes. I completely agree.

> Ideally though, we'd want to be making the good/bad decision at runtime, not at
> build time.

Maybe both. "Reliable" stations would be chosen at build time, stale observations would be identified at run time.

The proposal above was addressing Comment #8 about the current system for identifying stations, which "ends up not working very well".

I've added bug 574296 :
weather apps do not clearly identify stale observations






Comment 13 Steve Tyler 2009-03-06 20:27:04 UTC
Created attachment 130215 [details]
perl script to create stations database from NWS metar directory

The attached Perl script downloads the metar stations directory from the NWS at
ftp://tgftp.nws.noaa.gov/data/observations/metar/stations/
and inserts stations with their timestamps into an Sqlite database.

Usage: ./wx_get_stations.pl

After running, the database will be in "wx_reports_db.sqlite".

The database has two tables: "reports" and "log".

The "reports" table records a station id and the the date/time of a report from the station.
Timestamps are in UTC formatted as "YYYY-MM-DD HH:MM:SS".
Stations with timestamps that have not changed since the previous run of the script are ignored.

The log records completion date/time for each run of the script, the number of stations downloaded,
the number of stations in the database at the end of the run, and the elapsed time in seconds
(including download time, time to generate the SQL, and time to update the database).

Several views are included.

The most important one is v_stations_by_report_count. This view counts the number of times each station appears in the "reports" table and sorts the counts with the most frequent at the top. After repeatedly running the script over a period of time, v_stations_by_report_count will show which stations are most reliable about reporting relative to their peers.

Example report generation:

$ echo 'SELECT * FROM v_stations_by_report_count LIMIT 6;' | sqlite3 wx_reports_db.sqlite
AGGH|2
BGUK|2
BIAR|2
BIEG|2
BIKF|2
BIRK|2

The script is taking about 24 secs. to run (1.1 Ghz processor) and
the database is about 500 KB after two runs.
Suggestions for improving either are solicited.

Possibile enhancements include additional views and automatic removal of old reports.