After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 582865 - Gconf crashes Hardware-monitor applet
Gconf crashes Hardware-monitor applet
Status: RESOLVED WONTFIX
Product: GConf
Classification: Deprecated
Component: gconf
2.26.x
Other All
: Normal normal
: ---
Assigned To: GConf Maintainers
GConf Maintainers
gnome[unmaintained]
Depends on:
Blocks:
 
 
Reported: 2009-05-16 12:52 UTC by Francisco Pina Martins
Modified: 2018-08-17 13:58 UTC
See Also:
GNOME target: ---
GNOME version: 2.25/2.26


Attachments
hardware-monitor-bugreport.txt (18.29 KB, text/plain)
2009-05-20 13:28 UTC, Mike Auty
  Details
gconf-2.26.1-duffcache.patch (350 bytes, patch)
2009-06-23 22:45 UTC, Mike Auty
rejected Details | Review
hardware-monitor-1.4.2-create-viewer-type-key.patch (728 bytes, patch)
2009-11-04 13:22 UTC, Romain Perier
none Details | Review
New hardware-monitor-bugreport.txt (21.31 KB, text/plain)
2009-11-05 00:20 UTC, Mike Auty
  Details

Description Francisco Pina Martins 2009-05-16 12:52:23 UTC
Please describe the problem:
The title sums it up.
With the new Gconf (2.26.2 or 2.26.1) Hardware-monitor crashes when it is added to the panel.
Running hardware-monitor from the console gives the following message when crashing:

(hardware-monitor:3630): GConf-CRITICAL **: gconf_entry_get_value: assertion `entry != NULL' failed
Segmentation fault

Reverting back to Gconf 2.26.0 solves the problem

Steps to reproduce:
1. Install hardware-monitor (http://people.iola.dk/olau/hardware-monitor/)
2. Add it to panel
3. Watch it crash


Actual results:
Gnome panel will display the following popup message:
'The panel encountered a problem while loading "OAFIID:HardwareMonitor".
Do you want to delete the applet from your configuration?'

Expected results:
The applet is deleted fine from the panel if requested. It remains there and crashes at startup if asked not to delete.

Does this happen every time?
Yes

Other information:
I have tested this on Arch Linux (thus I found that downgrading to an earlier version of Gconf solved the problem. I have requested some testing on Ubuntu Forums (Karmic Koala Alpha uses Gconf 2.26.2) and I got these results:
http://ubuntuforums.org/showthread.php?p=7289359#post7289359
So it's not distribution specific.
Comment 1 ron accardy 2009-05-16 22:37:24 UTC
I have tested this on Ubuntu karmic Koala (9.10 alpha ) It segfaults as soon as started , the log message associated with the fault is segfault error 4 in libgconf-2.so.4.1.5
Comment 2 Mike Auty 2009-05-20 13:27:52 UTC
I can also confirm this happens.  I've got a bug-buddy output which I'll attach next that may help determine the problem.  The last few calls are:

  • #6 gconf_entry_get_value
    at gconf-value.c line 1577
  • #7 Gnome::Conf::Entry::get_value
    at entry.cc line 55
  • #8 Applet::viewer_type_listener
    at applet.cpp line 325

It seems that when the applet starts it registers with gconf to check some keys.  If it's the first run, or a new applet, those keys won't exist.  The code in question looks like:

  // circumvent GConf bug (FIXME: report it)
  gconf_client->set(gconf_dir + "/dummy", 0);
  gconf_client->set(gconf_dir + "/monitors/dummy", 0);
  
  // connect GConf
  gconf_client->notify_add(gconf_dir + "/viewer_type",
			   sigc::mem_fun(*this, &Applet::viewer_type_listener));

My guess is that the latest gconf doesn't like this notification on a non-existant entry, or fires the viewer_type_listener immediately whereas previously it wouldn't have.  Well, that's as far as I've managed to get to, the changelogs between 2.26.0 and 2.26.1 don't say anything particularly changed other than tracing functions.  I guess that either showed up a long time bug in the way hardware-monitor did it's stuff, or introduced a new bug...
Comment 3 Mike Auty 2009-05-20 13:28:25 UTC
Created attachment 135016 [details]
hardware-monitor-bugreport.txt
Comment 4 Mike Auty 2009-05-20 14:04:36 UTC
Sorry, the code section I quoted probably isn't the one that triggers it, it's probably the following section which comes immediately after the notify_add (there are three notify_adds):

  // start displaying something
  viewer_type_listener(0, gconf_client->get_entry(gconf_dir + "/viewer_type"));

What's really odd is that from the backtrace we can see a call to gconf_entry_get_value (entry=0x0), but the file that call's in hasn't been touched since 2008, save for changing the FSF address and using <config.h> rather than "config.h".  More confusing is that the function just calls glib:

{
  g_return_val_if_fail (entry != NULL, NULL);

  return REAL_ENTRY (entry)->value;
}

Which should be *fine* for entry == NULL, so I can't see where the signal's coming from, or why a downgrade should have any effect at all?
Comment 5 Francisco Pina Martins 2009-05-20 14:33:36 UTC
After reading your comment I have attempted a downgrade again, just to be on the safe side.

In fact, downgrading to 2.26.0 continues to solve the problem...

Would you like me to test anything else?
Comment 6 Francisco Pina Martins 2009-06-01 09:15:31 UTC
Any news on this one?

Does anyone know if any other applets crash due to this?
Comment 7 Mike Auty 2009-06-02 17:20:30 UTC
Looks like this might be related to bug 583326.
Comment 8 Mike Auty 2009-06-22 09:18:09 UTC
Ok, it looks as though I'm completely incapable of reading a simple backtrace.  It's neither of the code locations I was pointing out, it's immediately inside each of the listeners' code (ironically in a bit that checks that to make sure the entry is valid).

I've contacted the author and I'm going to work on a patch this next week when I get the chance...
Comment 9 Mike Auty 2009-06-23 22:18:02 UTC
Ok, after some git-bisection fun, I found the bad commit (e9489a1746b9df1983c8ae7331443d197e23d40b).  It has the following chunk change:

diff --git a/gconf/gconf-client.c b/gconf/gconf-client.c
index ed1ddfd..cb052cd 100644
--- a/gconf/gconf-client.c
+++ b/gconf/gconf-client.c
@@ -1964,7 +2006,26 @@ gconf_client_lookup (GConfClient *client,
   entry = g_hash_table_lookup (client->cache_hash, key);
 
   *entryp = entry;
-      
+
+  if (!entry)
+  {
+    char *dir, *last_slash;
+
+    dir = g_strdup (key);
+    last_slash = strrchr (dir, '/');
+    g_assert (last_slash != NULL);
+    *last_slash = 0;
+
+    if (g_hash_table_lookup (client->cache_dirs, dir))
+    {
+      g_free (dir);
+      trace ("Negative cache hit on %s", key);
+      return TRUE;
+    }
+
+    g_free (dir);
+  }
+
   return entry != NULL;
 }
 
Unfortunately it appears that even if entry is NULL (!entry), it can potentially return TRUE (my guess is only if the directory exists, but the key itself doesn't).  This then causes no error condition, but returns a NULL entry.

The documentation for gconf_client_get_entry seems to suggest that it will always return an entry (rather than NULL), but with this code here, it can return NULL.  This in turn causes gconfmm to think it has a valid entry (and allow get_value() calls on the entry) even though it doesn't.

Are there any devs looking at this issue?  Could someone please triage this so that it gets some attention?
Comment 10 Mike Auty 2009-06-23 22:45:04 UTC
Created attachment 137277 [details] [review]
gconf-2.26.1-duffcache.patch

Ok, I can't figure out what the cache is doing (or why the directory's being checked there), however, the following patch will fix the issue.  It leaves a completely pointless check to see whether the underlying gconf directory exists, but I don't have enough knowledge to figure out whether gconf_client_lookup is ok to return TRUE with a NULL entry or not, and therefore whether the check would be useful and I've patched the wrong bit.  If I have, then my guess is that "get" should ensure that error is set if it ever returns a NULL entry...
Comment 11 Francisco Pina Martins 2009-06-25 08:45:13 UTC
Ok, so here's what I did:
Downloaded the source code for GConf (ftp://ftp.gnome.org/pub/GNOME/sources/GConf/2.26/GConf-2.26.2.tar.gz), extracted it and applies the patch using "patch -p1 < gconf-2.26.1-duffcache.patch".
Then I ran "./configure", which outputted no errors.
Then I ran "make" which also completed with no errors.
Finally I ran "sudo make install" which also turned no errors.
I restarted the computer and... Hardware monitor still crashes. I've run "sudo make uninstall and reinstalled ArchLinux's package using the package manager and everything is back to the original state (still no hardware-monitor, of course).
This was the first time I have applied a patch to anything, and I'm also not very familiar with source compiling (but this seems to have turned out OK).
Is there something I am doing wrong? Why didn't the patch work for me?

Comment 12 Mike Auty 2009-06-25 08:58:56 UTC
Yep, you'd have to be careful that when calling ./configure you specify --prefix=/usr, otherwise it may have installed to /usr/local.  However, if you do install it to your main system removing it again may be a pain, depending on which files got put where...
Comment 13 Francisco Pina Martins 2009-06-25 09:21:13 UTC
OK, thanks for the tip. I guess I'll try it on a VM than. You know, just in case... =-)
I'll report back as soon as I get the chance to try it.
Comment 14 Francisco Pina Martins 2009-06-25 12:52:22 UTC
OK, so I finally had success with hardware-monitor.

Building directly from source and installing like mentioned above didn't work on Arch Linux. Gnome would crash with an error about power management.

However, using Arch's specific patches and applying them the patch provided by Mike did work, and hardware-monitor is back up and running on my VM.

Nothing seems to have been broken in the process. I'm guessing it's working for me then.

Thank you very much for this patch. How many more people need to test this before it can be made "official"?
Comment 15 Francisco Pina Martins 2009-06-25 20:44:24 UTC
I can confirm that the above approach is also working flawlessly in my main machine using x86_64.
I have also tested on 2 additional machines using i686 Arch Linux, and no problems there either.
Comment 16 Emilio Pozuelo Monfort 2009-07-11 22:35:34 UTC
I confirm changing the return value of TRUE to FALSE fixes loading of hardware-monitor.
Comment 17 Francisco Pina Martins 2009-08-11 21:42:32 UTC
It's been a while since I applied this patch and I don't seem to have suffered any adverse effects.

When will it get merged? I don't mean to sound rude or anything, but what's keeping this patch from reaching mainline?
Comment 18 Emilio Pozuelo Monfort 2009-08-11 21:56:13 UTC
BTW Debian ships that patch for gconf 2.26 for a while and we haven't seen any regression either.
Comment 19 Francisco Pina Martins 2009-09-22 14:55:44 UTC
Once again, not to sound rude or anything, but are there any news on this?

Thanks.
Comment 20 Mike Auty 2009-10-10 08:10:22 UTC
Hi,

Gnome 2.28.0 has come out, and it still features this bug.  There's a patch for this which has had testing by several different groups, including the Debian distribution (according to comment 18), so please could someone tell us what we haven't done yet to get this patch approved?  Does it require further testing?  Does it cause subtle problems we haven't figured out?  We really can't look for another solution unless we know what's wrong with this one.

Please let us know what's holding this bug up, so that we can do something about it and start making forward progress again.  Thanks...
Comment 21 Ray Strode [halfline] 2009-10-14 18:05:47 UTC
Review of attachment 137277 [details] [review]:

Hi Mike,

Thanks for looking into this.

So looking at the commit message for the commit you found by bisecting I see:

    Track fully-cached directories in the client
    
    This is used to do negative-hit caching, as well as listing all entries
    from the client cache.

and in that patch, I see:

-  /* We could just use the cache to get all the entries,
-   * iff we have previously done an all_entries and the
-   * cache hasn't since been tossed out, and if we are monitoring
-   * this directory.
-   * FIXME
-   */

So we can infer the purpose of the commit is to leverage the client cache more aggressively and avoid IPC to the daemon.

::: gconf/gconf-client.c
@@ +2025,3 @@
       g_free (dir);
       trace ("Negative cache hit on %s", key);
+      return FALSE;

This can't be right. If you return FALSE here, then there would be no point in running the code at all, since FALSE would get returned anyway.  Also, if you look at the callers of this function, they check for and handle the "entry == NULL, return TRUE" case.

It's probably a case of "return FALSE means 'I don't know if the key exists and return TRUE means 'I know about the key, if it's NULL it doesn't exist, if it's non-NULL then i'm giving it back to you".  a FALSE return means the caller needs to ask the daemon to find an authoratative answer.

From the commit message we know the motivation for the change is to leverage the cache more aggressively.  This hunk of code must be doing something to that end.

As you mentioned in comment 9, it's checking whether the directory exists in the cache but the key doesn't.  The logic is probably something like:

1) If the entry doesn't exist in the cache, then there are two possibilities
   a) the entry doesn't exist at all
   b) the entry does exist but hasn't made it to the cache yet
2) If the directory exists in the directory cache then the directory and it's keys must have already been processed by the caching code.  In that case, b) isn't applicable so the situation must be a).

So the bug isn't necessarily in this hunk of code.
Comment 22 Ray Strode [halfline] 2009-10-14 18:11:51 UTC
A few more comments:

1) gconf_client_get_entry is going to return NULL if the entry doesn't exist.
2) applications that are crashing are doing so because they assume the key does exist.  Presumably they are making this assumption because they just set the key.
3) If they just set the key and gconf-client subsequently incorrectly determines from the cache that the key doesn't exist, then there probably is a bug in the caching code, where it's not adding the key to the cache at the time it's set in some circumstances.
Comment 23 Mike Auty 2009-10-14 20:49:43 UTC
Hiya Ray, 

Thanks very much for investigating this!

I tried to go back to the beginning and see what's going on.  I still couldn't figure out why the thing segfaulted, but then I spotted this as the definition of the function causing the error:

gconf_entry_get_value (const GConfEntry *entry)
{
  g_return_val_if_fail (entry != NULL, NULL);
  ...
}

So (forgive me, my C's very rusty), but that suggests that it tries to dereferences *entry to get to entry.  The stack-trace shows that entry is handed in as 0x0, which presumably can't be dereferenced and hence the segfault?

If that's the case, should gconf_entry_get_value (and the others in gconf-value.c) have protection against being handed bad data?  Secondly, the question is why does gconfmm hand gconf a 0x0 *entry?  Thirdly, why does this happen only when the cache claims the directory exists but the key doesn't.

My guess is that your third extra comment is right.  There's definitely a bug here somewhere, the problem is locating it.  Perhaps approaching it from the gconfmm direction will tell how/when it arises?  It must be making an assumption somewhere that a pointer it receives back will never be NULL, and in this new instance, it is.

Unfortunately, I've no idea how to track it down?  I'm happy to run any tests and try out any patches you like.  I'm currently running gconf-2.28.0, so any patches that apply against that will be the easiest to test.  Just let me know how I can help, and I'll get on it...  5:)

Thanks again for your help Ray!

Mike  5:)
Comment 24 Ray Strode [halfline] 2009-10-16 18:25:41 UTC
Hi Mike,

entry isn't being dereferenced in the code snippet you tested.

g_return_val_if_fail is a way for library functions to ensure the caller isn't using the library in correction.  It's a way of enforcing the API contract between the caller and the function.

g_return_val_if_fail (entry != NULL, NULL);

means "It's against the rules for the caller to pass in a NULL entry.  If they do, write an error message to the console and bail out of the function immediately"

So passing in a NULL entry isn't allowed.  The caller presumably thinks the entry can't be NULL (probably because it just set the value).

The way you can track this down is

1) figure out what key is assocated with the entry.
2) figure out where in the code that key is set
3) verify that it gets set before the get_value call
4) trace into gconf and see if it's properly cached at the time of the key is set
Comment 25 Mike Auty 2009-10-16 21:52:49 UTC
Sorry Ray, I was referring to the actual backtrace in comment 3.  It shows the signal emitted right after a call to gconf_entry_get_value, which is a tiny function, and I was trying to figure out how it could possibly segfault.  The function takes in *entry, but then checks if entry != NULL (not *entry), so I figured there must've been some dereferrencing going on.  The backtrace clearly shows *entry set to NULL, and I figured that's what caused the segfault?

I've traced back a bit and found the section of code I believe is causing the problem (after a couple of false starts), which is the code mentioned in comment 8.  It occurs when the key doesn't exist, and as best I can tell the code is supposed to create the code, and then checks that it exists, and that somehow triggers the fault...

So the code goes something like this:

  gconf_client->notify_add(gconf_dir + "/viewer_type",
			   sigc::mem_fun(*this, &Applet::viewer_type_listener));

(I don't think the key exists at this point)

  viewer_type_listener(0, gconf_client->get_entry(gconf_dir + "/viewer_type"));

Then in viewer_type_listener:

void Applet::viewer_type_listener(unsigned int, Gnome::Conf::Entry entry)
{
  if (entry.get_value().get_type() != Gnome::Conf::VALUE_STRING) {
    // FIXME: use schema for this?
    Gnome::Conf::Value v(Gnome::Conf::VALUE_STRING);
    v.set(Glib::ustring("curve"));
    entry.set_value(v);

    gconf_client->set(entry.get_key(), Glib::ustring("curve"));
  }

And I believe it's in this code that it segfaults.  The code is supposed to check that the value exists and set it if not, but it segfaults on the entry.get_value in the if statement...

I'm not sure whether that should never have worked, but did by fluke, or whether it's supposed to work fine, and it's a problem in either gconfmm or gconf itself, but that's as far as I've been able to figure out.  I've no idea how to go about tracing gconf (or better yet gconfmm, which it all passes through) to figure out whether it's properly cached, or what's going on?  All I know is that whilst entry doesn't appear to be NULL (since get_value gets called), the underlying entry pointer in gconfmm must be NULL (based on the backtrace).  I've just no idea how it got that way, or what changed besides the gconf caching mechanism.

This may all work out to be a gconfmm issue (in which case I'm sorry to have wasted everybody's time), but if it is, it'd be really good to know exactly what's going wrong and get it fixed for everybody...

Hope that gives you an idea of where I've gotten to?

Mike  5:)
Comment 26 Romain Perier 2009-11-04 13:19:24 UTC
This is fixed on gentoo see https://bugs.gentoo.org/show_bug.cgi?id=288552, I wrote a patch for that :

- in Applet::Applet I just call gconf_client->set(gconf_dir + "/viewer_type", "curve"), in order to create this case if its does not exist yet.
It works perfectly now (2.28).

- Please get back me in case of problems, or if this patch is wrong.

see attachments.
Regards, 
Romain.
Comment 27 Romain Perier 2009-11-04 13:22:04 UTC
Created attachment 146909 [details] [review]
hardware-monitor-1.4.2-create-viewer-type-key.patch
Comment 28 Mike Auty 2009-11-04 13:25:00 UTC
This appears only to patch the symptoms in hardware-monitor, rather than the underlying problem (either in the gconf caching code, or assumptions made by gtkmm), so I'd request that this bug stay open until we can figure out exactly what causes the problem...
Comment 29 Romain Perier 2009-11-04 13:43:51 UTC
Applet::Applet ask to get an Entry for a key which does not exists yet, in this case gconf_client_get_entry() returns a NULL pointer... Qed...

I fixed that in hardware-monitor because it did that not correctly..
as Ray said.

I don't see the problem...
The problem for gconf caching code is an upstream issue, so it must be fixed on upstream and not on downstream.

btw,  gconf-value.c:gconf_entry_set_value() shouldn't crash when entry is NULL (for example add a g_return_if_fail )
Comment 30 Mike Auty 2009-11-05 00:20:43 UTC
Created attachment 146962 [details]
New hardware-monitor-bugreport.txt

The create-viewer-type-key patch solves that particular segfault, but it goes again during a set a little later on.  Here's the new bug buddy trace...
Comment 31 Francisco Pina Martins 2010-08-03 09:24:33 UTC
Is this still an issue?
The latest version of Hardware-monitor (what led me to submit this bug) works fine now without any patches.
The status is also still unconfirmed. Shouldn't it be changed?
Comment 32 Francisco Pina Martins 2011-05-20 14:20:59 UTC
I suppose this could be closed now. It's become irrelevant with the release of Gnome-shell and the deprecation of panel applets.
Comment 33 Paul Menzel 2012-10-22 12:07:15 UTC
(In reply to comment #18)
> BTW Debian ships that patch for gconf 2.26 for a while and we haven't seen any
> regression either.

This was assigned number #532119 in the Debian BTS [1].

[1] http://bugs.debian.org/532119
Comment 34 Paul Menzel 2012-10-22 12:16:40 UTC
(In reply to comment #33)
> (In reply to comment #18)
> > BTW Debian ships that patch for gconf 2.26 for a while and we haven't seen any
> > regression either.
> 
> This was assigned number #532119 in the Debian BTS [1].

Please note that Debian ships the patch, that was rejected by Ray Strode in comment 21.

> [1] http://bugs.debian.org/532119
Comment 35 Francisco Pina Martins 2014-05-29 10:23:16 UTC
I said it once and I say it again...
Is this still relevant? Or should it be closed?
Comment 36 André Klapper 2018-08-17 13:58:50 UTC
GConf has been deprecated since 2011.

GConf is not under active development anymore. Its codebase has been archived:
https://gitlab.gnome.org/Archive/gconf/commits/master

dconf and gsettings are its successors. See https://developer.gnome.org/gio/stable/ch34.html and https://developer.gnome.org/GSettings/ for porting info.

Closing this report as WONTFIX as part of Bugzilla Housekeeping to reflect
reality. Feel free to open a task in GNOME Gitlab if the issue described in this task still applies to a recent + supported version of dconf/gsettings. Thanks!