GNOME Bugzilla – Bug 121003
Don't ask to reconnect, just do it
Last modified: 2007-09-14 15:20:11 UTC
If you get disconnected, why wouldn't you want to be reconnected as soon as it works again? We could try connecting, say every minute, until it works.
What happens if you run the Gossip client twice. It would get into a recursive loop of connecting and reconnecting. It should either ask, be optional or have some sort of intelligence about how offen it is reconnecting.
The disconnecting when you log in twice is just a bug, that should be fixed.
The problem is the '<stream:error>Disconnected</stream:error></stream:stream>' information that is received does not say WHY. If it said duplicate login or something to that effect, then we could ignore it and not ask to reconnect.
Sorry, I mixed this up with the bug where gossip crashed on multiple logins. Don't mind me ;)
*** Bug 138850 has been marked as a duplicate of this bug. ***
It also happens when the connection to the server is lost. I don't want to be asked every time whether I want to reconnect when the disconnect was caused by a network failure.
maybe a compromise could be to reconnect automatically just one time. only when gossip is disconnect in a defined time again, gossip should ask for a reconnect. this could be an automatically reconnect on little network errors, but prevent endless reconnections on bigger problems..
Yeah, we definitely should do some checking before ending up in an endless loop of trying...
*** Bug 157489 has been marked as a duplicate of this bug. ***
Perhaps an exponential backoff for reconnection is in order? First auto-reconnect is at 5 seconds, next at 10, next at 20, next at 40.. so so forth?
Moved 0.7 milestone bugs to 0.10
What's so bad about endlessly trying to reconnect?
Just didn't get time to do it before I did this release. It was long over due.
Moving to 0.11 milestone.
Markus, if you read the comments above, it becomes pretty clear. You are thrown out if you log in twice with the same resource, and if we keep reconnecting when that happens, the two clients will keep doing a ping-pong reconnecting dance forever. Therefore we need some kind of solution before this can be solved. Waiting for a minute etc really help, it will just do the same but slower.
What about the following: If you're kicked from the server after a small amount of time, and this happens n times, then connect with another resource (like foo@bar.org/Home2 instead of Home). And _maybe_ inform the User about that. But when the connection times out (for example after suspend-to-ram), gossip should more or less instantly try to reconnect. This is the concrete issue I have. I think this case should be distinguishable from the logged in twice case.
As Markus Bertheau has pointed out, Gossip can distinguish between the situations when the server intentionally *disconnected* and when the connection *timed out*. In the former case we can have all sorts of discussions about reasonable behaviour, but in the latter case I think that Gossip should just keep reconnecting at regular (short!) intervals. Skype is user friendly in this regard: at any moment "green" means "I have a connection to the internet" and "grey" means "no connection", without any manual work. Please fix this bug, or at least let me know that you approve this approach and maybe I'll whip up a patch myself. This bug is also in Ubuntu Malone: https://launchpad.net/products/gossip/+bug/38844
A patch would be most welcome! Yes this sounds like the way forward. I think a short interval should be every 5 minutes.
I have looked a bit into this, found this to be a bit tricky. I am quite short on time and not familiar with gossip code. Just wanted to notify anyone else who was thinking of writing a patch that they should go ahead. Sorry for the fuss.
With NetworkManager and a little D-Bus magic we can know if we have a network connection and we can be notified for modifications. If network manager says we are online and we get a disconnection it means we have multiple clients using the same ressource, we can append something to the ressource like HOME -> HOME2 and retry to connect. If Network manager isn't available we can try by appening something to the ressource, if it works that's great, if not we know that's network problem and we can retry every n minutes. We already have NM support in gossip-dbus.[ch], and I see we already abord reconnecting when NM says we are offline. I think gmail always append a hash to the ressource to be sure there is no collision, we should maybe do it too. Is it possible, before connecting, to get a list of ressource already connected for an account ? If so we can directly know if there is a collision. One problem is to append something useful. If we just change to HOME2 and both clients are running gossip they will both make the same change and retry to connect... Maybe a random number is good enough ?
Just FYI, other clients do that to, for example the Mac OS X client AdiumX (it just adds a random hex number (6 chars or so). I wouldn't mind doing it way, especially since we don't really make the resource that visible in Gossip anyway.
I made some tests with other clients. When I'm connected with gaim and I try to connect to the same account with the same resource in another client, gaim disconnect with an error message saying there is a resource conflict ! So it is possible to know the reason why we are disconnected, I checked in loudmounth and there is no detailed reasons available, maybe that's something that can be improved in loudmounth so ? I'll try the make a patch that does that: - When connecting, append a random number to the resource. - When disconnected and NM manager says we are offline, connect to a signal emitted by NM which says when network is accessible again and the retry only then. - When disconnected and NM manager says we are online or if there is no NM available, retry every n minutes with another random number append to the resource. That seems to be a good way, no ?
Ooh, interesting about the resource conflict message. Is there a way to look at the jabber xml that gaim receives when that happens? Your plan sounds good to me at least.
Don't know but in gaim 2.0beta3 You can see the code at src/protocols/jabber/jabber.c:1489 if(xmlnode_get_child(packet, "conflict")) { js->gc->wants_to_die = TRUE; text = _("Resource Conflict"); }
> - When disconnected and NM manager says we are offline, connect to a signal > emitted by NM which says when network is accessible again and the retry only > then. We actually already do this. See src/gossip-dbus.c. Personally, I don't like using the resource for random numbers, I would rather the string represents something meaningful, since that is used by other clients and by yourself in some instances (for other clients). If we get "conflict" in the xml returned, then there should be no need for the whole random number thing, but if we don't then I guess we should do it. However, some servers may not support that xml error, since we have had the problem in the past of not knowing why we were disconnected :/ Random number it is I guess! :) Thank goodness for Network Manager :D
Created attachment 71217 [details] [review] proof of concept I post that just to demonstrate what's possible. With some tweak it can do the job I think. Is there some libloudmouth developers here who can see if it's possible to have something like LM_DISCONNECT_REASON_RESOURCE_CONFLICT ?!? Maybe by reading how gaim does. Comments are welcome :-)
Created attachment 71218 [details] [review] fix a little bug in condition with NM
Created attachment 71219 [details] [review] really fix it, this time I should sleep !! :D
Hi :) Good work so far, just a few comments: 1. I would rather the "reason" variable went after all the GossipSession and GossipProtocol, etc types (i.e. at the end of the signal variables), I am pinikity about these things, since logically, it is the least important piece of information. 2. I know you have iterated this, but it still looks like it will fail :) > #ifdef HAVE_DBUS > /* If NM says we are offline that's useless to retry to connect, > * NM will tell us when network is up again. */ > if (!gossip_dbus_nm_get_state (&nm_connected)) { > nm_connected = TRUE; > } > #endif > if (reason != GOSSIP_DISCONNECT_ASKED && nm_connected) { > /* unexpected disconnection, try to reconnect */ > priv->retry_connect_id = g_timeout_add (RETRY_CONNECT_TIMEOUT * 1000, > app_retry_connect_cb, > NULL); > } i.e. If gossip_dbus_nm_get_state (&nm_connected) actually fails, then you overwrite the &nm_connected value with nm_connected = TRUE. It should really be: I would do: should_reconnect = reason != GOSSIP_DISCONNECT_ASKED; if (gossip_dbus_nm_get_state (&nm_connected)) { should_reconnect &= nm_connected; } if (should_reconnect) { ... } This way, if it can't talk to NM, then it will still attempt to reconnect. It is always better to make something more readable and maintainable than compressed and complicated. About Loudmouth hackers, Richard and I are those too :) It should be fairly quick and simple to knock up a patch to return the resource conflict error. About errors, I forgot about all those disconnection conditions (unless they were added recently), but work adding those in, not sure why we didn't before :)
Created attachment 71237 [details] [review] proposed patch Here is an updated patch: - appen 6 random chars within 'A'->'Z' or '0'->'9' to the resource. - the reason is now the last arg of signals. - new condition proposed by martyn. I think that's all we can do until loudmouth gives more detailed disconnect error.
Thanks for the patch, I don't have time to check it before releasing 0.15 and my vacation, but I will when I return.
Sorry about this Xavier, but could you redo the patch for me with an up to date cvs HEAD, it doesn't patch for me now, I think because I did this after the other patches you submitted me this week :P
Created attachment 72163 [details] [review] updated updated to last CVS.
I have fixed a couple of things you missed. 1. You should only connect the account that was disconnected, not call gossip_app_connect (NULL, TRUE) which connects the auto-connect accounts. So I set up a hash table and did that properly. 2. You should return FALSE from the reconnect timeout since it will try to reconnect every <timeout value> (which was 10 seconds) and if you don't connect by then, then the client gets its tits in a twist. I set the reconnect timeout to 20 seconds instead of 10. Also, I have put #ifdefs round the resource randomiser code so we can use it later if we want, but it isn't being used now. Great work though, thanks.
Created attachment 95597 [details] Doesn't reconnect When the server ist stopped, Gossip gets a "connection refused", which it represents the user (see attached screenshot). But it seems to never reconnect autoatically.