GNOME Bugzilla – Bug 342151
Odd crash with new info part creator
Last modified: 2006-05-26 18:54:07 UTC
All shall be revealed in a moment...
Created attachment 65708 [details] [review] New info part creator This patch is a work in progress to try and fix emacs / xemacs info page loading. {x}emacs uses a peculiar loading thing whereby it creates its own subdir in the standard info directory and references its docs in the info table as "(emacs-21/emacs-faq.info.gz)" or similar. Within the info files, any parts are refered to as "emacs-faq-1.info.gz" etc. which is a) really annoying but more importantly: b) crashes yelp when trying to load it. This patch should make loading of these abominations possible but... When this patch is applied, the flex info page (which is a single part, in the standard directory) crashes yelp (or does some other weird things) when loaded through the TOC. I've been searching for this inexplicable bug for several days now, so I figured if anyone wants to help... Steps: 1. Apply patch and rebuild 2. from command line, run $ ./src/yelp info:flex watch how it doesn't crash and loads the page correctly. 3. Load up yelp to the TOC and navigate through to the flex info page (in the developement section) and load it. Watch yelp die in a variety of bizarre and ugly ways depending on the phase of the moon. Ways so far encountered: *** glibc detected *** free(): invalid next size (normal): 0x08db8bc0 *** *** glibc detected *** free(): invalid next size (fast): 0x08de9188 *** Hanging (not redrawing the window) at various stages of the loading process Proper crashing Loading the page (inexplicably, it occasionally works and loads the page) I've tried a variety of other info pages and (barring some totally unrelated problems that existed before applying the patch) there have been no problems. As loading the info page directly works, I suspect some deeper problems somewhere along the lone (TOC processing code?), but after several days of searching have come up with nothing. I would really like to get this problem fixed before committing the code, so anyone any ideas? (Note, this patch is definitely a work in progress. There is a large chink #if 0'd out and it needs a bit of clean up before committing)
Sorry, a minor correction: the flex page consists of 2 different pages, so the new code is getting checked out. It does work through the direct loading though, so some dark force is still at work :(
So, another few hours of trying this I can now report the following: 1. This bug does not appear to occur under valgrind. 14 attempts later. 2. Disabling man page support makes this bug go away. A workaround (that screws up man pages unfortunately) is at yelp-toc-pager.c, line 869 (around about), the section: if (sect) { gchar **sects = g_strsplit ((gchar *)sect, " ", 0); for (j = 0; sects[j] != NULL; j++) g_hash_table_insert (priv->man_secthash, g_strdup(sects[j]), node); } Freeing sects using g_strfreev and the problem disappears. I'm not really sure it dissappears or whether its just masked again but... I also tried destroying the hash table when finished, but it didn't make it go away yet. I'll keep trying though.
Flex page loads fine either way: /opt/gnome2/bin/yelp info:flex check /opt/gnome2/bin/yelp and navigate to Flex info page check After browsing flex and then the nano page, I got this error smitten@home:/extra/cvs/gnome2/yelp-head$ ./src/yelp doc_uri is file:///usr/share/info/flex.info.gz doc_uri is file:///usr/share/info/nano.info.gz *** glibc detected *** free(): invalid next size (normal): 0x0905a368 *** if you run the process through gdb, it should break on double free()s or other errors. This will allow you to get a backtrace Don't know how this would be related to man pages. Are any of the structures shared - is this related to TOC processing?
(In reply to comment #4) > Flex page loads fine either way: > > /opt/gnome2/bin/yelp info:flex check > /opt/gnome2/bin/yelp and navigate to Flex info page check > > After browsing flex and then the nano page, I got this error > > smitten@home:/extra/cvs/gnome2/yelp-head$ ./src/yelp > doc_uri is file:///usr/share/info/flex.info.gz > doc_uri is file:///usr/share/info/nano.info.gz > *** glibc detected *** free(): invalid next size (normal): 0x0905a368 *** That looks similar to what I'm seeing on occasion. Although, as I said, I get it browsing to the flex page. nano loads fine (as with any other page). > > if you run the process through gdb, it should break on double free()s or other > errors. This will allow you to get a backtrace The problem is that the break comes almost at random. Mostly, it claims to be crashing in a totally different part of the code during a malloc or free call. And the stacktrace isn't the same when done twice in a row :( > > Don't know how this would be related to man pages. Are any of the structures > shared - is this related to TOC processing? > I'm not sure it is, I think the man page stuff is just masking the problem. I think there's a possible invalid read/write or a rogue free-without-nulling-then-freeing-again somewhere. (Really, I don't know what's going on. I'm guessing these situations should be picked up by valgrind / glibc). I've spent the last while trying to clean up the valgrind output as much as is humanly possible, but still no hint of what is happening.
Okay, another few hours of work seems to have eliminated the problem. In the TOC code, the overridden yelp_xslt_document function returns (to libxml) a new document, new_doc (a pointer to it). At the end of the function, we free this new_doc. I have no idea why libxml doesn't complain or why this hasn't come to light before, but not freeing new_doc makes the problem go away. I've committed the patch: 2006-05-23 Don Scorgie <dscorgie@cvs.gnome.org> * src/yelp-toc-pager.c: Don't free the newly created doc while still in use (should fix bug #342151) Closing.
Don not sure why this fixes the problem, but the free() of the new_doc object is intentional. The xslt_yelp_document function is actually an extension function that is registered to handle the <yelp:document> element (you will find this in the yelp/stylesheets/toc2html.xsl stylesheet). Once libxslt reaches this element <yelp:document> in a stylesheet it will pass the context node of the XML file being transformed ("node") as well as a pointer to the yelp:document element itself ("inst") to the xslt_yelp_document function. This function basically saves information about the current context, and sets the ctxt->output parameter to new_doc. Then the context node ("node" in the function) is processed by the children of the yelp:document element xsltApplyOneTemplate (ctxt, node, inst->children, NULL, NULL); The results of performing the transform are actually stored in the page_buf variable, so there is no reason to keep the new_doc around. xsltSaveResultToString (&page_buf, &buf_size, new_doc, style); The ctxt->output is restored to old_doc, so there shouldn't be any hanging references to new_doc I do see one thing that might be wrong looking at the function: new_doc->dict = ctxt->dict; xmlDictReference (new_doc->dict); The new_doc steals a reference to the dict (not sure what this is), but then never unreferences it. With the encoding issues I've been thinking about with man pages lately, I thought of another culprit: If the info page you are processing might is not in UTF-8, then it could cause these kind of strange errors. If it has any characters > 0x7F, then that may be why you are seeing these strange libxml2 errors and double frees. Just a thought.
BTW, did you follow Federico's instructions on how to run valgrind (or rather how to setup for running valgrind) http://primates.ximian.com/~federico/news-2006-04.html#19
Sigh. Reopening until the real culprit can be ascertained. >The ctxt->output is restored to old_doc, so there shouldn't be any hanging >references to new_doc Eep. Sorry, missed the reverting of the output. >With the encoding issues I've been thinking about with man pages lately, I >thought of another culprit: If the info page you are processing might is not in >UTF-8, then it could cause these kind of strange errors. If it has any >characters > 0x7F, then that may be why you are seeing these strange libxml2 >errors and double frees. Just a thought. Doesn't make any difference which encoding I use. Still behaves badly when freeing new_doc, still works perfectly without freeing. >I do see one thing that might be wrong looking at the function: >new_doc->dict = ctxt->dict; >xmlDictReference (new_doc->dict); Strangely, there doesn't seem to be any method of unreffing a dictionary from libxml. Not adding a ref makes things go bang. >BTW, did you follow Federico's instructions on how to run valgrind (or rather >how to setup for running valgrind) >http://primates.ximian.com/~federico/news-2006-04.html#19 Alas, not yet. I shall try that next. I'm currently trying to get jhbuild up and running to see if the problem persists in the latest dev versions of glib / gtk / whatever the hell else yelp links against. There is a (tiny) invalid read / write pair reported by valgrind that I missed last time through that seems to be comming from g_io_channel_read_until_end. I have no idea whether this is the culprit, but I'll investigate further. Until this is nailed, if there are no objections, I'll leave new_doc as an unfreed object (creates a small memory leak). Otherwise, I'll have to revert a few patches I've committed that depend on it.
Sorry for the spam. I had an inspiration. I seem to have fixed it (by fixing the invalid write mentioned above). I've reverted the change made previously (i.e. new_doc is once again freed) and committed the new fix. I also fixed a large number of memory leak that I came across looking for this bug. Once again (hopefully for the last time), closing this bug. 2006-05-26 Don Scorgie <dscorgie@cvs.gnome.org> * src/yelp-toc-pager.c: Free newly created doc when finished (revert of previous change) Fix some other memory leaks * src/yelp-info-parser.c: Fix an invalid write (fix bug #342151, again) Fix a bucket-load more memory leaks