GNOME Bugzilla – Bug 76283
gweather fails to display if DNS failed
Last modified: 2005-01-06 10:09:50 UTC
if gweather is started before a network interface is brought up, it will never display anything. if my panel starts up before i bring up my network interface, i have to restart the panel for gweather to work.
Hmm, there is something very fishy about gweather. I get this also, but only on occasion. This is heinous, marking up the severity
Would be great to have a fix before 2.0.0 but not a showstopper IMHO.
I tried reproducing this bug, brought down the network interface, started the panel, tried updating the gweather it failed. Later brought up the network interface and tried updating the gweather, it successfully updated. I didn't restart the panel for gweather to successfully update.
Funny that. Don't know if it's related, I have a couple of inbox monitor applets on my panel. If I start up my laptop without the network connected, even after connecting to the network and checking my mail with Evolution, they refuse to connect. This doesn't happen every time for me either. In fact, strangely enough, it starts working fine every time I go to investigate ;oP -- Ross
The real problem is that gweather reads /etc/resolv when it starts, but doesn't reread it if it changes after the network connection is up. If you have a dial-up connection you probably rewrite /etc/resolv when you connect, but gweather ignores this. (Well, in fact it's glibc that ignores this).
This is what I observed: * I do a network shutdown 'sh network stop'. Start the panel. Add the applet and update. It gives retrieval failed. Now I start the network 'sh network start'. When I do an update for the applet, it displays the required information. * I commented out the nameserver info. in /etc/resolv.conf. Start the panel. Add the applet. It gives retrieval failed. Now I uncommented out the nameserver info. When I do an update, it still gives retrieval failed. But if I remove the applet from the panel and add it back, the applet displays the information. It is the same behaviour for stock-ticker applet too. I don't think the problem is in the applet. gnome_vfs_inet_connection_create () returns error because gethostbyname returns NULL. gethostbyname () reads /etc/resolv.conf only when it is called for the first time,ie. when adding the applet to the panel. Further calls to gethostbyname () seems to not read /etc/resolv.conf. I did a strace of gweather, but could not locate any open of /etc/resolv.conf after the first one. Hence it fails to detect any changes to /etc/resolv.conf after the applet is added to the panel. However, this works on Solaris. truss output on solaris shows that it reads the /etc/resolv.conf every time I do an update. Hence, I feel this is a problem with the behaviour of gethostbyname and not a problem with the applets.
Both gweather and stock-ticker use gnome-vfs for network retreival. Any network issues are because of gnome-vfs. Personally I use dial-up for network connection and gweather works okay for me even though I'm often not connected.
Kevin: should this bug be moved to gnome-vfs then ?
yeah it's probably a gnome-vfs bug. i remember chris or shaver mentioning about this bug in mozilla - i guess there's some api call in glibc to reload /etc/resolv.conf, and you can do this after (1) failed dns lookup.
The relevant bugs at bugzilla.mozilla.org are 64857 and 117628, and the api call is res_init(3). And according to the bug report you have to unset the RES_INIT bit in _res.options to make libc reread /etc/resolv.conf. Hope this helps.
*** Bug 71199 has been marked as a duplicate of this bug. ***
Thanks for looking that up, Miguel.
Yeah, res_init() is the call that you need to make. Mozilla just makes it whenever there's a DNS failure.
Could this be the cause of my perpetually updating gweather? It has the question mark icon and 52 degrees (it's snowing and 36 here now. :), and the tooltip says "Updating..." No matter how many times I right-click on it and tell it to update, it never goes out and gets it. If I change the location, though, it seems to work, and I can then change it back. If it's not caused by this, I'll open another bug.
*** Bug 96161 has been marked as a duplicate of this bug. ***
Blah. Still a problem in 2.1.x, still irritating as all get out and making things break.
In an attempt to fix this bug, I looked at the two mozilla bugs mentioned in this bug report, I also found #166479 which is related. Here is a quick summary of my readings: res_ninit should be preferred to res_init since the former is thread safe. The problem is that res_ninit only appeared with glibc 2.2 (which rh6 doesn't have for instance). I think this bug could be fixed by calling res_ninit when a dns lookup fails, and retrying the lookup after that. As mentionned in the mozilla bugs, if one moves from one network to another, and the dns of the first network is accessible from the second one, dns requests will travel from the second network to the first unnecessarily. But the most annoying problem will be fixed :) It should probably be enough to add this res_ninit in gnome_vfs_inet_connection_create after the get_host_by_name, I'll try to make a patch tomorrow
Created attachment 13660 [details] [review] proposed patch
I haven't tested at all this patch (apart from checking if it compiles and if gweather still works), but I'd expect it to fix this gweather problem. Can someone who was seeing the problem test it ?
You should look at the updated mozilla source code. There's a more portable way that works with a whole bunch of different versions of glibc. (I think you have to call _res_init() instead of res_init() but I can't remember the details clearly.)
I looked at mozilla source code (in mozilla/netwerk/dns/src/nsDnsService.cpp), and I only found a hack to be able to use res_ninit (on platforms which have it) even if the binary was compiled on a machine which had a libc without it (I guess that is meant to be used to build the nightly builds on rh6.2 or some older distro while being able to take advantage of res_ninit when run on rh7). This doesn't look like something necessary for gnome.
*** Bug 108690 has been marked as a duplicate of this bug. ***
did anyone check if the patch really works?
The patch seems basically all right, but it doesn't fix the whole issue. gethostbyname() and getaddrinfo() is called from several other places, and on some systems there are threadsafeness issues with gethostbyname(). I think the correct way to fix this is to add a common resolver call in gnome-vfs that is threadsafe and supports IPV6 and res_ninit. Then we use this function in all the places where we now do getaddrinfo() and gethostbyname(). teuf, are you interested in this? It requires some thinking about the API so that all users in gnome-vfs get what they want and it can be implemented threadsafely using whatever calls the OS offers.
For info about threadsafeness issues, read resolv/README in the glibc sources.
I believe there's a threadsafe gethostbyname() in libsoup which should work on at least Solaris and Linux. You might want to look at that.
gethostbyname() is actually threadsafe in glibc 2.2, since the data used is stored in thread-specific data. However, this is not true for all systems, so there are thread-safe extensions that you can use, and in the end fall-back to locking around gethostbyname(). I haven't looked at the libsoup resolver, but if it is a complete independent resolver-implementation that sounds a bit overkill, and will cause problems with e.g. having to update it for IPV6 etc.
It's not a complete reimplementation, it uses native gethostbyname_r() in glibc, on solaris and hp/ux and does locking around gethostbyname() when those aren't available. And it handles IPv6.
So do we cut'n'paste it from libsoup?
Mass resetting target milestone to clean up gnome-vfs milestones. Please complain if I removed a milestone which should have been kept. Sorry for the spam
We really should have a threadsafe resolver in gnome-vfs and use it from everywhere. One that supported async resolving would be nice also.
This is a really bad bug for laptop users; it'd be really nice if we could finally get it fixed, even if the fix is suboptimal.
As I added a generic resolver API for gnome-vfs now, I am adding support for resolver reloading to this new API. I modified the autoconf stuff to pick up the proper library if necessary and reorded the whole resolver part that only libresolve links against the resolver libraries if neccessary. I also added a maximum reload interval. (The idea is based on mozilla code :)
Created attachment 29782 [details] [review] Here we go!
Sorry .. not "that only libresolve" but "that only libgnomevfs" .. sorry for the spam.
Created attachment 30059 [details] [review] patch to reload resolver on error Updated version (I have tested this version and it works for me)! If somebody could test it and report me any errors, that would be great! :)
Created attachment 30106 [details] [review] patch committed with CVS Here is the latest update of the patch which was committed to CVS. I feared the previous patch could go in an unfinite loop, so gicmo changed it so that res_ninit is called at most once for each resolving call.
The configure.in changes are the same as in attachment #30059 [details]. Can anyone test whether this patch helps or not (gnome-vfs > 2.7.5 is needed)
*** Bug 90111 has been marked as a duplicate of this bug. ***