After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 651378 - g-s hangs for 5-10s and emits tons of warnings about access points disappearing
g-s hangs for 5-10s and emits tons of warnings about access points disappearing
Status: RESOLVED FIXED
Product: gnome-shell
Classification: Core
Component: network-indicator
3.0.x
Other Linux
: Normal normal
: ---
Assigned To: gnome-shell-maint
gnome-shell-maint
: 654482 656812 662965 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2011-05-29 07:38 UTC by Martin Dengler
Modified: 2013-06-14 16:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
backport of 6709e5e45 to gnome-shell-3.0.2-4.fc15.x86_64 (79.45 KB, text/plain)
2011-08-31 04:11 UTC, Martin Dengler
  Details
Backport commit 6709e5e45 - don't show hidden access points (3.21 KB, patch)
2011-08-31 19:17 UTC, Federico Mena Quintero
rejected Details | Review
backport of 6709e5e45 to gnome-shell-3.0.2-4.fc15.x86_64 (2.61 KB, patch)
2011-09-02 16:04 UTC, Martin Dengler
rejected Details | Review
NetworkMenu: don't query DBus properties of removed objects (2.79 KB, patch)
2011-10-14 14:51 UTC, Giovanni Campagna
committed Details | Review

Description Martin Dengler 2011-05-29 07:38:01 UTC
I'm getting a LOT of these messages, so much so that gnome-shell hangs my laptop for about 5-10 seconds when the screen unblanks from idle:

Window manager warning: Log level 16: (nm-access-point.c:402):nm_access_point_connection_valid: runtime check failed: (ap_ssid != NULL)
Window manager warning: Log level 16: _nm_object_get_property: Error getting 'Ssid' for /org/freedesktop/NetworkManager/AccessPoint/8162: (19) Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist

The problem seems to be that js/ui/status/network.js's _createSection() iterates over NMDeviceWireless._networks each time an access point is added or removed, regardless of whether the list of access points is shown.  This iteration over _networks causes a comparison between the _networks[i].ssid member and the activeApSsid, which access (.ssid) can fail if the access point no longer exists (for example, a load of access points just went away, and g-s is in the first resulting call to _accessPointRemoved). 

Perhaps I am unusual in the number of access points that float by...at home I can see about 60 access points, and on a 1/2 hour bus ride to work the _networks list usually grows to about 250.
Comment 1 Martin Dengler 2011-05-29 07:45:04 UTC
If any of these suggestions sound reasonable I can take a stab at them:

1) _networkCompare(network, accessPoint) should check if its arguments are valid*

2) _accessPoint{Added,Removed} should check whether the access point is valid* before calling _findNetwork (since if it's not valid, it won't be found since the ssid can't be determined, which is what's causing all those error messages I'm seeing)

3) _accessPoint{Added,Removed} should not call _clearSection() and _createSection(); rather, when the network popup menu is shown, the list of access points should be fetched from NM with some kind of async throbber to indicate the list is being updated


* in the sense that nm_access_point_connection_valid will not complain as I'm seeing
Comment 2 Giovanni Campagna 2011-06-02 12:34:29 UTC
(In reply to comment #1)
> If any of these suggestions sound reasonable I can take a stab at them:
> 
> 1) _networkCompare(network, accessPoint) should check if its arguments are
> valid*

How? The only way to check is to fetch the ssid property, which in turns is fetched over DBus and fails with that cryptic error (generated by dbus-glib).

> 2) _accessPoint{Added,Removed} should check whether the access point is valid*
> before calling _findNetwork (since if it's not valid, it won't be found since
> the ssid can't be determined, which is what's causing all those error messages
> I'm seeing)

An access point is always valid, in the sense that it can always be used for a device, even if the ssid is not known. Nevertheless, if the ssid is not known, it makes sense to hide it from the UI and don't check _findNetwork. This is bug 646454.

> 3) _accessPoint{Added,Removed} should not call _clearSection() and
> _createSection(); rather, when the network popup menu is shown, the list of
> access points should be fetched from NM with some kind of async throbber to
> indicate the list is being updated

Why so? And how would it change the current situation?
(The problem is not fetching the SSID from NetworkManager, is that the ssid is not broadcasted by the AP)

> 
> * in the sense that nm_access_point_connection_valid will not complain as I'm
> seeing

I think nm_access_point_connection_valid should just return FALSE if the ssid is NULL, instead of emitting a warning.
Comment 3 Martin Dengler 2011-06-02 18:14:32 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > If any of these suggestions sound reasonable I can take a stab at them:
> > 
> > 1) _networkCompare(network, accessPoint) should check if its arguments are
> > valid*
> 
> How?

Perhaps calling something like connection_valid(...) on accessPoint?

> An access point is always valid, in the sense that it can always be used for a
> device, even if the ssid is not known.

Sorry, I confused "access point" with "access point's connection", since that's what the error message is complaining about.

I don't buy that I'm seeing so many hidden SSIDs, and that they also have an invalid "mode", but is that what you think these error messages mean?

$ grep _nm_object_get_property ~/.xsession-errors | cut -d/ -f 6 | cut -d: -f 1 | sort | uniq | wc
    459     459    1836
$ grep _nm_object_get_property ~/.xsession-errors | tail -3
Window manager warning: Log level 16: _nm_object_get_property: Error getting 'Ssid' for /org/freedesktop/NetworkManager/AccessPoint/803: (19) Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist
Window manager warning: Log level 16: _nm_object_get_property: Error getting 'Mode' for /org/freedesktop/NetworkManager/AccessPoint/803: (19) Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist
Window manager warning: Log level 16: _nm_object_get_property: Error getting 'Ssid' for /org/freedesktop/NetworkManager/AccessPoint/808: (19) Method "Get" with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't exist

> 
> > 3) _accessPoint{Added,Removed} should not call _clearSection() and
> > _createSection(); rather, when the network popup menu is shown, the list of
> > access points should be fetched from NM with some kind of async throbber to
> > indicate the list is being updated
> 
> Why so?

1) Because it's not necessary to update a GUI until the GUI is visible
2) Because it's a waste of CPU - the GUI being updated is visible a tiny fraction of the time

If n access points disappear, we now have a guaranteed n-squared cost to update the list of active networks _every_time_ those access points disappear, rather than a linear cost the TINY fraction of cases the GUI is shown.

> And how would it change the current situation?

1) It would reduce the spammy warnings.
2) It would lessen the CPU impact of the spammy warnings

$ grep _nm_object_get_property ~/.xsession-errors | wc
 455444 10475173 107490430
$ grep nm_access_point_connection_valid ~/.xsession-errors | wc
   5954   77402  809744

Not writing 455,000 lines to .xsession-errors would be nice.

> (The problem is not fetching the SSID from NetworkManager, is that the ssid is
> not broadcasted by the AP)

I'm not sure why you think this is the problem.  I don't understand how the errors are generated by hidden SSIDs.  If there is some test I could do to confirm / deny that I"m happy to do do.
Comment 4 Giovanni Campagna 2011-06-02 19:16:59 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > If any of these suggestions sound reasonable I can take a stab at them:
> > > 
> > > 1) _networkCompare(network, accessPoint) should check if its arguments are
> > > valid*
> > 
> > How?
> 
> Perhaps calling something like connection_valid(...) on accessPoint?

connection_valid, as the name implies, checks a connection, not the access point. And it assumes that the access point itself is valid (thus the warning).

> > An access point is always valid, in the sense that it can always be used for a
> > device, even if the ssid is not known.
> 
> Sorry, I confused "access point" with "access point's connection", since that's
> what the error message is complaining about.
> 
> I don't buy that I'm seeing so many hidden SSIDs, and that they also have an
> invalid "mode", but is that what you think these error messages mean?
> 
> $ grep _nm_object_get_property ~/.xsession-errors | cut -d/ -f 6 | cut -d: -f 1
> | sort | uniq | wc
>     459     459    1836
> $ grep _nm_object_get_property ~/.xsession-errors | tail -3
> Window manager warning: Log level 16: _nm_object_get_property: Error getting
> 'Ssid' for /org/freedesktop/NetworkManager/AccessPoint/803: (19) Method "Get"
> with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't
> exist
> Window manager warning: Log level 16: _nm_object_get_property: Error getting
> 'Mode' for /org/freedesktop/NetworkManager/AccessPoint/803: (19) Method "Get"
> with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't
> exist
> Window manager warning: Log level 16: _nm_object_get_property: Error getting
> 'Ssid' for /org/freedesktop/NetworkManager/AccessPoint/808: (19) Method "Get"
> with signature "ss" on interface "org.freedesktop.DBus.Properties" doesn't
> exist

These error messages are generated by a combination of libnm-glib and dbus-glib bugs, and happen when the access point doesn't have a SSID. In normal cases, this means the AP is hidden.

> > 
> > > 3) _accessPoint{Added,Removed} should not call _clearSection() and
> > > _createSection(); rather, when the network popup menu is shown, the list of
> > > access points should be fetched from NM with some kind of async throbber to
> > > indicate the list is being updated
> > 
> > Why so?
> 
> 1) Because it's not necessary to update a GUI until the GUI is visible
> 2) Because it's a waste of CPU - the GUI being updated is visible a tiny
> fraction of the time
> 
> If n access points disappear, we now have a guaranteed n-squared cost to update
> the list of active networks _every_time_ those access points disappear, rather
> than a linear cost the TINY fraction of cases the GUI is shown.

I see your points. But I would have to maintain two very different code paths, one for generating the list and one for interactive updates, connecting and disconnecting to signals all the time.
In any case GUI updates should normally be O(1), and rendering is of course skipped if the menu is hidden. Internal data structure updates are O(N), but they're required (you don't want to fetch data over dbus while the menu is opening). They could be reduced to O(logN) since networks are sorted, but I don't think it is worth it.

> > And how would it change the current situation?
> 
> 1) It would reduce the spammy warnings.

Spammy warnings are a bug. The complexity of updates does not affect that.

> 2) It would lessen the CPU impact of the spammy warnings
> 
> $ grep _nm_object_get_property ~/.xsession-errors | wc
>  455444 10475173 107490430
> $ grep nm_access_point_connection_valid ~/.xsession-errors | wc
>    5954   77402  809744
> 
> Not writing 455,000 lines to .xsession-errors would be nice.
>
> > (The problem is not fetching the SSID from NetworkManager, is that the ssid is
> > not broadcasted by the AP)
> 
> I'm not sure why you think this is the problem.  I don't understand how the
> errors are generated by hidden SSIDs.  If there is some test I could do to
> confirm / deny that I"m happy to do do.

The errors are generated by missing Ssid property in the access point dbus object. I'm told that, in the normal case, only hidden APs have these behavior.
Comment 5 Martin Dengler 2011-06-03 03:11:21 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (In reply to comment #1)
> > > > If any of these suggestions sound reasonable I can take a stab at them:
> > > > 
> > > > 1) _networkCompare(network, accessPoint) should check if its arguments are
> > > > valid*
> > > 
> > > How?
> > 
> > Perhaps calling something like connection_valid(...) on accessPoint?
> 
> connection_valid, as the name implies, checks a connection, not the
> access point. And it assumes that the access point itself is valid
> (thus the warning).

The warning specifically states the problem is that the access point's
connection is not valid, and the code generating the warning is
nm_access_point_connection_valid.

$ grep nm_access_point_connection_valid ~/.xsession-errors | wc
 5954     77402   809744

I'll try checking connection_valid and report back.  It's a pain
writing the code because I lack the knowledge to test it from the gjs
REPL so I'm stuck restarting gnome-shell :(.

> These error messages are generated by a combination of libnm-glib
> and dbus-glib bugs, and happen when the access point doesn't have a
> SSID. In normal cases, this means the AP is hidden.

I am seeing literally hundreds - 459 I think I quoted - of different
access points with these errors.  Unless NM is creating new AP numbers
for existing hidden APs, I find it surprising that there would be so
many (my laptop has been sitting in a hotel room immobile for two
days...I'm travelling).

Whatever the situation, I'm going to take a stab at eliminating the
warnings because there're a real problem.  If you have any suggestions
or can guide me towards a patch that you would accept please let me
know.


> > > > 3) _accessPoint{Added,Removed} should not call _clearSection()
> > > > and _createSection(); rather, when the network popup menu is
> > > > shown, the list of access points should be fetched from NM
> > > > with some kind of async throbber to indicate the list is being
> > > > updated
[...]
> I see your points. But I would have to maintain two very different
> code paths, one for generating the list and one for interactive
> updates, connecting and disconnecting to signals all the time.

If you want to be perfect, perhaps, but right now just enumerating the
access points when the popup is shown and not updating it (having the
list change as you're looking at it might be confusing, anyway) seems
much better than updating a list of APs that are never used.

Let me give you a patch rather than keep whining.


> In any case GUI updates should normally be O(1), and rendering is of
> course skipped if the menu is hidden.

sure, but if the problem were actually the GUI updates then it would
almost never happen, since the GUI (list of APs) is, like I said,
almost never shown relative to how many times the AP list data
structures are actually updated.

> Internal data structure updates are O(N)

...for _each_ AP change.  I see AP changes in batches, I suppose as NM
scans.  So this O(n^2) actually happens.  Based on the errors in the
log, it happens a lot.  O(n^2) data structure updates that are
_almost_never_necessary_ is (IMHO) much worse than a delay in showing
the popup menu.

> [Internal data structure updates are] required (you don't want to
>  fetch data over dbus while the menu is opening).

If it's slow once while showing the menu, why do it even more:
_every_time_ an AP changes?

Again, I'll patch rather than whine.

> > > (The problem is not fetching the SSID from NetworkManager, is
> > > that the ssid is not broadcasted by the AP)
> > 
> > I'm not sure why you think this is the problem.  I don't
> > understand how the errors are generated by hidden SSIDs.  If there
> > is some test I could do to confirm / deny that I"m happy to do do.
> 
> The errors are generated by missing Ssid property in the access
> point dbus object. I'm told that, in the normal case, only hidden
> APs have these behavior.

I will check the network.ssid when ap.get_ssid() fails to confirm
this.  Like I said above, I'm surprised how many hidden ssids the
errors I'm seeing.
Comment 6 Giovanni Campagna 2011-06-03 19:11:15 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > (In reply to comment #2)
> > > > (In reply to comment #1)
> > > > > If any of these suggestions sound reasonable I can take a stab at them:
> > > > > 
> > > > > 1) _networkCompare(network, accessPoint) should check if its arguments are
> > > > > valid*
> > > > 
> > > > How?
> > > 
> > > Perhaps calling something like connection_valid(...) on accessPoint?
> > 
> > connection_valid, as the name implies, checks a connection, not the
> > access point. And it assumes that the access point itself is valid
> > (thus the warning).
> 
> The warning specifically states the problem is that the access point's
> connection is not valid, and the code generating the warning is
> nm_access_point_connection_valid.

And for that very reason you want to call nm_access_point_connection_valid less, not more. :)
 
> $ grep nm_access_point_connection_valid ~/.xsession-errors | wc
>  5954     77402   809744
> 
> I'll try checking connection_valid and report back.  It's a pain
> writing the code because I lack the knowledge to test it from the gjs
> REPL so I'm stuck restarting gnome-shell :(.

You can use the looking glass for your experiments.

> > These error messages are generated by a combination of libnm-glib
> > and dbus-glib bugs, and happen when the access point doesn't have a
> > SSID. In normal cases, this means the AP is hidden.
> 
> I am seeing literally hundreds - 459 I think I quoted - of different
> access points with these errors.  Unless NM is creating new AP numbers
> for existing hidden APs, I find it surprising that there would be so
> many (my laptop has been sitting in a hotel room immobile for two
> days...I'm travelling).

Again, one error is that nm_access_point_get_ssid() returns NULL, and nm_access_point_connection_valid() doesn't like that.
The other is that we're calling nm_access_point_get_ssid(), which will call out to DBus if the value is NULL causing the dbus error because at the server side, the object doesn't have a Ssid property.

> 
> Whatever the situation, I'm going to take a stab at eliminating the
> warnings because there're a real problem.  If you have any suggestions
> or can guide me towards a patch that you would accept please let me
> know.

It would certainly be helpful if you manage to fix libnm-glib caching for the ssid, so that we can call nm_access_point_get_ssid() how often we like (could involve porting to GDBus, and breaking API again). The other fix, that is not calling nm_access_point_connection_valid() when the ssid is missing, is in bug 646454.

> 
> > > > > 3) _accessPoint{Added,Removed} should not call _clearSection()
> > > > > and _createSection(); rather, when the network popup menu is
> > > > > shown, the list of access points should be fetched from NM
> > > > > with some kind of async throbber to indicate the list is being
> > > > > updated
> [...]
> > I see your points. But I would have to maintain two very different
> > code paths, one for generating the list and one for interactive
> > updates, connecting and disconnecting to signals all the time.
> 
> If you want to be perfect, perhaps, but right now just enumerating the
> access points when the popup is shown and not updating it (having the
> list change as you're looking at it might be confusing, anyway) seems
> much better than updating a list of APs that are never used.
> 
> Let me give you a patch rather than keep whining.

Let's see the patch then.

> 
> > In any case GUI updates should normally be O(1), and rendering is of
> > course skipped if the menu is hidden.
> 
> sure, but if the problem were actually the GUI updates then it would
> almost never happen, since the GUI (list of APs) is, like I said,
> almost never shown relative to how many times the AP list data
> structures are actually updated.
> 
> > Internal data structure updates are O(N)
> 
> ...for _each_ AP change.  I see AP changes in batches, I suppose as NM
> scans.  So this O(n^2) actually happens.  Based on the errors in the
> log, it happens a lot.  O(n^2) data structure updates that are
> _almost_never_necessary_ is (IMHO) much worse than a delay in showing
> the popup menu.

You want to show the current connection at all times, and to provide notifications. To have that information, you of course need to have all active connections. Available connections could be fetched as needed, but see below.

> > [Internal data structure updates are] required (you don't want to
> >  fetch data over dbus while the menu is opening).
> 
> If it's slow once while showing the menu, why do it even more:
> _every_time_ an AP changes?

Because we never call out to dbus, we use signals, that of course we get because the AddMatch is just "everything coming out from o.f.NetworkManager")
(Ok, that would be the ideal world with o.f.D.Properties.GetAll and o.f.D.ObjectManager, but that's a libnm-glib problem, not gnome-shell's)

> Again, I'll patch rather than whine.
> 
> > > > (The problem is not fetching the SSID from NetworkManager, is
> > > > that the ssid is not broadcasted by the AP)
> > > 
> > > I'm not sure why you think this is the problem.  I don't
> > > understand how the errors are generated by hidden SSIDs.  If there
> > > is some test I could do to confirm / deny that I"m happy to do do.
> > 
> > The errors are generated by missing Ssid property in the access
> > point dbus object. I'm told that, in the normal case, only hidden
> > APs have these behavior.
> 
> I will check the network.ssid when ap.get_ssid() fails to confirm
> this.  Like I said above, I'm surprised how many hidden ssids the
> errors I'm seeing.

NetworkManager creates and destroys dbus paths as it likes, and it never reuses the same object path, so errors with different paths can come from the same AP appearing and disappearing.
Comment 7 Milan Bouchet-Valat 2011-06-06 20:47:43 UTC
I've just understood that the freeze I'm getting each time I switch users comes from this bug. In my case, that's not even a freeze: I can't get the machine back to work. I've an i5 CPU and still, I can wait as much as I want, the machine won't respond. That's probably because I have about 50 <unknown> APs listed, which likely don't really exist (drivers artifact?).

Isn't there a simple workaround for the Shell? Even not removing the errors, but just ensuring within a few seconds the Shell is back to work. Else, that's just like a plain crash from a user POV.
Comment 8 Federico Mena Quintero 2011-08-02 15:29:20 UTC
Where do those <unknown> APs come from?  If I run "nm-tool" by hand, it only lists about four names; definitely not about the 40 or so <unknown> entries I get in gnome-shell's menu.
Comment 9 Federico Mena Quintero 2011-08-02 15:41:55 UTC
And those <unknown> entries are bogus; there are only two or three APs around my house, which show up with low strengths, and definitely not tens of them with full signal strength.
Comment 10 Milan Bouchet-Valat 2011-08-07 10:48:14 UTC
*** Bug 654482 has been marked as a duplicate of this bug. ***
Comment 11 Giovanni Campagna 2011-08-16 13:19:14 UTC
(In reply to comment #9)
> And those <unknown> entries are bogus; there are only two or three APs around
> my house, which show up with low strengths, and definitely not tens of them
> with full signal strength.

What is the output in lg of
Main.panel._statusArea['network']._devices.wireless.devices[0].device.get_access_points()
?

It could be a bug in gnome-shell or libnm-glib, or it could be that nmcli filters those entries (and thus the bug is in NetworkManager).
Comment 12 Federico Mena Quintero 2011-08-16 15:14:46 UTC
I'm on gnome-shell 3.0.2, so Main.panel._statusArea doesn't exist yet.

I'll log the result of get_access_points() when it gets called from NMDeviceWireless._init() and tell you the results.
Comment 13 Dave Airlie 2011-08-18 11:16:34 UTC
*** Bug 656812 has been marked as a duplicate of this bug. ***
Comment 14 Dave Airlie 2011-08-18 11:18:54 UTC
same here, fast-user-switch triggers it a lot, and it really really isn't helping my relationship :), on the shared laptop it hangs completely and never comes back.

On my other 2-3 laptops it happens usually when I'm unlocking the screensaver, I get a black screen and nothing is rendering, I type in the password blind and it unlocks but g-s never quite recovers..

I've got about 50 network access points in view from this place, and I also see the whole 30-40 unknown problems.

still think it was a good plan to shove all these applets into the rendering loop?
Comment 15 G K 2011-08-26 03:48:08 UTC
my feeling is that, the screen update or other gnome-shell processing shouldn't be held up by applets that not completed their task.  I would expect thinks to work as follows.
1) In the event that an applet/applets have not not completed its/their task, then those applets ought have their gui controls greyed out or be marked as updating. The rest of the independent gui elements that are ready ought to be functional.
2) The applets themselves should multitask/multi-thread and be able to generate a menu or user interface while its core-logic sorts itself out.
Comment 16 richard_s_burton@msn.com 2011-08-30 05:29:35 UTC
This issue is apparently causing up to 5 minute screen blanking when returning from notebook hibernating. Not just 5-10 seconds.

Countless <unknown> network access points build up and it takes up to 5 minutes for the screen and keyboard to come back to life.

Downstream bug report
https://bugzilla.redhat.com/show_bug.cgi?id=705609


What I gather from the comments:
1. The <unknown> wireless APs should not be accumulating.
2. The GUI functionality is contingent upon the applet looking through this countless list of <unknown> APs (cannot be real network points - just too many).

Do we know who can fix these two issues?

F14 did not have this issue. This is my only major complaint about F15.
Please, please prioritize a fix in Sept if possible.
Comment 17 Giovanni Campagna 2011-08-30 11:46:40 UTC
Uhm... this is supposed to be already fixed by 6709e5e458d7258d35ac4c8fac904542f2564001, as the shell no longer tracks unknown access point. Could you test 3.1.90 and see if it's still reproducible?

(For the record, the bug probably was that APs will null SSID compared equal to no other, so we'd never remove them when the underlying NM object disappeared, and continuously added them when a new NM object representing them appeared)
Comment 18 Dave Airlie 2011-08-30 13:57:54 UTC
Hi Giovanni, I don't suppose you give me a network.js patch based on 3.0.x to test? since this is a major problem in Fedora 15 and I'd like to fix it there as well.

Looking at the changes you made I made some similiar locally to 3.0.x and it seems to fix it, getting 3.1.x to jhbuild has just wasted my day and it won't even start due to some other bugs.
Comment 19 Christian Krause 2011-08-30 22:00:00 UTC
(In reply to comment #17)
> Uhm... this is supposed to be already fixed by
> 6709e5e458d7258d35ac4c8fac904542f2564001, as the shell no longer tracks unknown
> access point. Could you test 3.1.90 and see if it's still reproducible?

Sorry, I can't test 3.1.90, but I can confirm that the issue is still present in gnome-shell-3.0.2-4.fc15.i686 (most up-to-date gnome-shell package in Fedora 15).
Comment 20 Martin Dengler 2011-08-31 04:11:00 UTC
Created attachment 195267 [details]
backport of 6709e5e45 to gnome-shell-3.0.2-4.fc15.x86_64



(In reply to comment #19)
> (In reply to comment #17)
> > Uhm... this is supposed to be already fixed by
> > 6709e5e458d7258d35ac4c8fac904542f2564001, as the shell no longer tracks unknown
> > access point. Could you test 3.1.90 and see if it's still reproducible?
> 
> Sorry, I can't test 3.1.90, but I can confirm that the issue is still present
> in gnome-shell-3.0.2-4.fc15.i686 (most up-to-date gnome-shell package in Fedora
> 15).

I made an attempt to backport 6709e5e45 to F15's gnome-shell, and it seems to be helping.  If you drop it in to /usr/share/gnome-shell/js/ui/status/network.js and restart gnome-shell it might help.
Comment 21 Federico Mena Quintero 2011-08-31 19:17:41 UTC
Created attachment 195343 [details] [review]
Backport commit 6709e5e45 - don't show hidden access points

For clarity, this is the diff of Martin's backport against the
gnome-3-0 branch.
Comment 22 Federico Mena Quintero 2011-09-01 16:54:54 UTC
With that patch, I seem to get at most 0.5-sec delays  when unsuspending, or when coming back from the screensaver.  This is much, much better than before!

I *have* ocassional delays while in the middle of working (say, the shell freezes for a few seconds), but I have no idea if they are caused by the network status thingy as well.
Comment 23 Martin Dengler 2011-09-02 16:04:01 UTC
Created attachment 195498 [details] [review]
backport of 6709e5e45 to gnome-shell-3.0.2-4.fc15.x86_64

My backport backported a bit too much, and caused the list of APs to not be shown in the NMApplet drop-down menu.  This new backport patch fixes the problem, which was the last three lines that the patch added which sent the wrong type of "accessPoint" object to NMNetworkMenuItem._init(...).
Comment 24 Federico Mena Quintero 2011-09-02 21:48:08 UTC
This morning I pressed a key to get my machine back from the screensaver, and got a 30-second delay (eyeballed).  Meanwhile, the disk churned a *lot* - I don't know if gnome-shell was swapped out or something.
Comment 25 Martin Dengler 2011-09-03 01:15:45 UTC
I still get 0-5s hangs, but they're nothing like the long hangs I used to get before the patch.  I suspect it's swapping, as g-s takes up a gig of VM for me:

ps axuwww | grep /usr/bin/gnome-shell | tail -1
martin   16895  1.1  3.2 1360972 124904 ?      Sl   Sep02   6:56 /usr/bin/gnome-shell
Comment 26 Christian Krause 2011-09-05 00:16:07 UTC
I can confirm Martin's latest observations. When applying the 2nd attachment (backport of 6709e5e45 to gnome-shell-3.0.2-4.fc15.x86_64), the issue does not happen anymore. I had sometimes a delay of a couple of seconds, too, but for sure less than 10 seconds. Before it took sometimes several minutes until the login dialog appeared.

The "<unknown>" have vanished from the list as expected.
Comment 27 Dan Williams 2011-10-10 18:45:15 UTC
So one thing I just noticed.  The accessPointRemoved() handler in network.js calls findNetwork(), which calls the libnm-glib get_ssid() function, which can make a D-Bus property request to NM.

Now, of course if the access point is removed, it's gone; NM has already deleted that AP and you're just getting a notification about it.  libnm-glib preserves the AP object across the signal emission just so you can clean up anything you might be doing with the AP object, but by this time making property requests (like SSDI) is probably not a great thing to do.  The properties are usually cached, but there are circumstances in which they won't be cached and thus will trigger a D-Bus request to get the property.

So the main problem is essentially that calling get_ssid() on the AP at any point after receiving the access-point-removed signal may cause these issues.  Typically in C-based languages you'd just check your internal list for the address of the AP object that was passed to the signal handler, but it appears that's not what's happening here in the JavaScript.

Instead, what we probably want to do here is search through each element of this._networks, checking the 'accessPoints' property of each element for the access point object in question (maybe using its D-Bus path if we can't directly compare object addresses in JS) and if it's found, remove it from the 'accessPoints' property, and then if the accessPoints list length is 0, remove that element entirely from this._networks.
Comment 28 Dan Williams 2011-10-10 18:54:40 UTC
ie something like this, since the D-Bus path is a libnm-glib side thing and will *always* be valid until the GObject is destroyed:

diff --git a/js/ui/status/network.js b/js/ui/status/network.js
index 0dc7ee2..d84a915 100644
--- a/js/ui/status/network.js
+++ b/js/ui/status/network.js
@@ -1192,6 +1192,16 @@ NMDeviceWireless.prototype = {
         return -1;
     },
 
+    _findNetworkByPath: function(path) {
+        for (let i = 0; i < this._networks.length; i++) {
+            for (let j = 0; j < this._networks[i].accessPoints.length; j++) {
+                if (this._networks[i].accessPoints[j].get_path() == path)
+                    return i;
+            }
+        }
+        return -1;
+    },
+
     _accessPointAdded: function(device, accessPoint) {
         if (accessPoint.get_ssid() == null) {
             // This access point is not visible yet
@@ -1273,7 +1283,7 @@ NMDeviceWireless.prototype = {
     },
 
     _accessPointRemoved: function(device, accessPoint) {
-        let pos = this._findNetwork(accessPoint);
+        let pos = this._findNetworkByPath(accessPoint.get_path());
 
         if (pos == -1) {
             log('Removing an access point that was never added');
Comment 29 Giovanni Campagna 2011-10-14 14:51:56 UTC
Created attachment 199013 [details] [review]
NetworkMenu: don't query DBus properties of removed objects

Calling nm_access_point_get_ssid() in the handler of the
access-point-removed signal can result in DBus request, which will
then fail because the object was already removed at the server side.
Instead, use a difference function to retrieve the access point
object (the network), that compares directly by object identity.
Comment 30 Jasper St. Pierre (not reading bugmail) 2011-10-20 19:36:51 UTC
Review of attachment 195343 [details] [review]:

Marking "rejected" to get the patch off of the review list.
Comment 31 Jasper St. Pierre (not reading bugmail) 2011-10-20 19:36:59 UTC
Review of attachment 195498 [details] [review]:

Marking "rejected" to get the patch off of the review list.
Comment 32 Giovanni Campagna 2011-10-28 20:08:14 UTC
*** Bug 662965 has been marked as a duplicate of this bug. ***
Comment 33 Jasper St. Pierre (not reading bugmail) 2011-10-28 20:12:07 UTC
Review of attachment 199013 [details] [review]:

This makes sense to me. I don't think anyone but you knows the code enough in-depth to be able to review this properly. I've been testing it for a few days now and it hasn't blown up on me. ACN, feel free to commit, if it continues to blow up, we'll re-fix it.
Comment 34 Giovanni Campagna 2011-10-29 10:25:10 UTC
Attachment 199013 [details] pushed as 3d3c954 - NetworkMenu: don't query DBus properties of removed objects
Comment 35 Allison Karlitskaya (desrt) 2011-11-06 06:00:42 UTC
Should this also be on gnome-3-2?
Comment 36 Allison Karlitskaya (desrt) 2011-11-06 19:29:28 UTC
Reopening because the patch contains an error:

in _accessPointRemoved, 'pos' is eliminated by the patch, but (currently on master) we see further down:

            this._networks.splice(pos, 1);

resulting in:

    JS ERROR: !!!     message = '"pos is not defined"'
    JS ERROR: !!!   Exception was: ReferenceError: pos is not defined
    JS ERROR: !!!     lineNumber = '1325'
    JS ERROR: !!!     fileName = '"/usr/share/gnome-shell/js/ui/status/network.js"'
    JS ERROR: !!!     stack = '"([object _private_NMClient_DeviceWifi],[object _private_NMClient_AccessPoint])@/usr/share/gnome-shell/js/ui/status/network.js:1325
Comment 37 Allison Karlitskaya (desrt) 2011-11-06 19:34:55 UTC
Matthias just pointed me at bug 663278.

Still, this should probably be backported... I'm seeing the issue on my F16 box.
Comment 38 richard_s_burton@msn.com 2011-11-06 21:11:17 UTC
I'm still experiencing this issue in F15 after performing 'yum update'.
Comment 39 Florian Müllner 2013-06-14 16:41:41 UTC
The issue was fixed, closing.