GNOME Bugzilla – Bug 371405
Crashes when removing rows from header pane
Last modified: 2011-01-07 04:10:55 UTC
Hi, A Debian user has reported the following at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=397273 : ----8<---- I am a relatively new pan user, and have been using it since around version 0.113. I have been experiencing crashes when I try to delete a post for quite some time, but it's not very reliably reproducible, i.e., some other activities are needed before the deletion to trigger the crash. Now I've finally decided to use a script to run pan so that I can catch the debug runlog and core dump, and submit a useful bug report. Attached is the backtrace generated by "thread apply all backtrace" and the final 500 lines of the runlog. I still keep the core dump and the whole ~1MB runlog, so if more information is needed (such as a backtrace with -dbg packages installed), please ask. ----8<---- A backtrace is available at the URL mentioned above.
*** Bug 373184 has been marked as a duplicate of this bug. ***
373184 has an interesting backtrace:
+ Trace 84898
And from the original Debian report:
+ Trace 84899
An updated backtrace for 0.119 was submitted at http://bugs.debian.org/cgi-bin/bugreport.cgi/backtrace?bug=397273;msg=17;att=1 The associated run log is available at http://bugs.debian.org/cgi-bin/bugreport.cgi/runlog.gz?bug=397273;msg=17;att=2 I include the backtrace here for convenience:
+ Trace 87952
Thread 1 (process 2981)
I haven't been able to reproduce this for over a month, though I don't know whether it was due to a change in Pan or Gtk+. Is this still happening for anyone?
Yes, I still get segfaults when searching the groups for keywords (I'm using 0.121). The problem usually happens after two or three searches. I could try to get a backtrace, but I think it would be equal to the one I submitted before.
Bruno, what version of gtk+ do you have installed?
GTK+ version 2.8.20.
Original bug reporter was/is also using 2.8.20.
Created attachment 81380 [details] [review] test patch #1 This is a test patch based on the theory that there are live GtkTreeIters to some of the rows being deleted. The patch frees the row's memory only after the row_removed signals have been fired. Since I'm not able to reproduce this bug, I'd appreciate feedback from testers who give this patch a spin. Also, if it's possible, exact instructions on how to trigger this crash would be good too.
Soren, could you forward the patch to the original reporter, of ask him to cc himself to this ticket?
Done.
More questions for any and all comers: does it matter whether or not you have rows selected when you apply the filter? Can you make it crash when no rows are selected?
I'm running 0.121 with the patch above and still experienced the bugs. It doesn't matter whether there are rows selected. The method I use to reproduce is simple: perform a few searches, open another group, perform a few other searches, another group. It usually crashes after 10-15 searches. I'm compiling pan with -g right now, and I'll run it in gdb from now on. Anything else aside a backtrace that could help you to track this bug?
Hm. Installing the glib and gtk debuginfo RPMs would be helpful so that the gtk+ part of the backtrace will also have -g info.
Also if you have valgrind installed, running Pan inside of valgrind can give very helpful information in cases of memory errors -- more helpful than gdb. It's kind of slow, and better if you compile pan _without_ optimizations before running it: % export G_SLICE=always-malloc % export G_DEBUG=gc-friendly % export GLIBCXX_FORCE_NEW=1 % valgrind --tool=memcheck --leak-check=full --leak-resolution=high \ --num-callers=64 --log-file=pan-valgrind --show-reachable=yes ./pan
Running pan with valgrind was *unbearably* slow. I'm attaching a new backtrace, hoping it will be useful. This happened after I clicked on another newsgroup and it started to download new headers, a different situation from my previous one. I wasn't watching pan, so I can't say at what point it happened. Program received signal SIGSEGV, Segmentation fault.
+ Trace 106535
Thread NaN (LWP 4502)
Could this be linked to the number of simultaneous connections to the news server, leading to some sort of race condition?
This backs up the backtrace in comment #2... No, it's not a race condition. The only threads remaining in Pan is a worker thread to make new server connections without blocking the main thread.
*** Bug 401398 has been marked as a duplicate of this bug. ***
All: can you reproduce it without reading any articles, without deleting any articles, and without fetching new articles when you enter the group?
Darren Albers reports on pan-users: > I tried and no luck after ~30 searches between various groups. > > This is on my Feisty box which is running GTK 2.10.9
*** Bug 402678 has been marked as a duplicate of this bug. ***
David Shochat reports on pan-users: > I just thought I'd mention that this all reminds me of the segfault > (involving gtk_tree) problem I had a while back: > http://bugzilla.gnome.org/show_bug.cgi?id=346588 > Nobody could reproduce it. Meanwhile I was getting it consistently. > Then I discovered that the problem went away when I tried using a > fresh home directory. So I assume it was one of my dot files or > directories that was somehow corrupt. The reporter could just > create a new temporary account with a new home and try there. > > -- David
I've now tried hundreds of search strings on four separate computers with varying architectures and Linux versions, and have yet to trigger this bug even once.
Created attachment 81592 [details] [review] Fix referencing condition when switching groups This may fix potential corruption issues that occur when changing groups.
I've already tried deleting .pan2 once, but to no avail. The bug seems to be more likely to appear in groups with a large number of articles.
Bruno, does the patch in Comment #25 help anything?
No, even with the patch in Comment #25 it still crashed here. The backtrace is the same as the last one, so I won't duplicate it: again, I was downloading new headers for a group. I had not opened the group in this session before, so I hadn't made any searches in it. Do you guys have a PanTreeStore with printfs() that I could run for you? It could help you to catch the bug.
Created attachment 81701 [details] [review] sterner assertions and tests in pan-tree Here's another patch that may shed light on the problem. This one checks the integrity of the tree before and after every call that changes it. This will make the tree run more slowly -- though still usable, not like valgrind -- but should let us know if there is any memory corruption taking place in pan-tree itself.
Created attachment 81706 [details] [review] even harsher assertions
Hi anybody can tell me how can have a bactrace for pan using Windows OS?? How can I help programmers? thank
I tested all day long with the assertions (and the two first patches) and couldn't reproduce the bug. I'll keep testing to see if I get something.
The first two bugs do fix potential problems -- whether they fix this particular bug ticket or not -- so I'm going to check them in. :)
No, it's still there, and the asserts don't seem to have caught it: (gdb) r (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) [Thread debugging using libthread_db enabled] [New Thread -1219508544 (LWP 23077)] [New Thread -1226503248 (LWP 23078)] [New Thread -1234891856 (LWP 23079)] [New Thread -1243280464 (LWP 23080)] [New Thread -1251669072 (LWP 23081)] Program received signal SIGSEGV, Segmentation fault.
+ Trace 108286
Thread NaN (LWP 23077)
Bruno, I can't figure out why gtk_tree_view_set_model is being called from PanTreeStore::remove_siblings. It doesn't make sense, so I wonder if the backtrace is corrupted. The next easiest thing I can think of to try is for you to run 0.123 in valgrind, as described above, just long enough to make it crash (I know valgrind makes things run slowly), and attach valgrind's log file.
Running 0.123, backtrace, downloading new headers. I'm really sorry about the valgrind output, but it's *WAY* too slow to be usable. If I only could reproduce the bug consistently, I'd try it, but sometimes it takes more than an hour for pan to crash. I now noticed that the line numbers aren't showing up; I think make install was stripping the binary or something. Sorry, I'll get a backtrace from the non-stripped program. Program received signal SIGSEGV, Segmentation fault.
+ Trace 108596
Thread NaN (LWP 11065)
Bruno I appreciate your help, but this is just the same backtrace again and again. So, my comments about moving to the next level a la valgrind still stand. Yes, I understand that it is slow. What OS are you running, and what are you using to build Pan? Does it also crash on the stock install?
Okay, I'm trying valgrind, after having two crashes in ten minutes today. Unfortunately, in this first run I couldn't reproduce the bug. I have a long log, though, do you guys want it? Here's the summary: ==4070== LEAK SUMMARY: ==4070== definitely lost: 50,756 bytes in 207 blocks. ==4070== indirectly lost: 113,264 bytes in 5,584 blocks. ==4070== possibly lost: 223,194 bytes in 412 blocks. ==4070== still reachable: 2,174,065 bytes in 19,109 blocks. ==4070== suppressed: 0 bytes in 0 blocks. I'm running Linux 2.6.17.13, gcc 3.4.6, glibc 2.3.6, glib 2.10.3 and gtk 2.8.20, in a slackware system. More news at six. Thanks for the patience and for pan.
Yep! just bzip the pan-valgrind file and attach it here. Thanks Bruno!
Created attachment 82183 [details] Valgrind log of running Pan for some time. No crash. There's *no* segfault in this log.
Bruno, could you also please attach the config.log file generated when you built Pan? It should be in the top-level of the source directory. Thanks again.
Created attachment 82196 [details] My config.log for 0.123
Created attachment 82251 [details] Valgrind log for segmentation fault So, here's the beast you asked. The full log for a session that crashed with a segmentation fault. I ran it with "valgrind --tool=memcheck --num-callers=64 --log-file=pan-valgrind --show-reachable=yes --leak-check=full --leak-resolution=high pan". I hope you can find this bug at last and squash it with a pleasant *PLOSH* noise. Once again, it segfaulted while downloading headers for a group.
Wow, this is a very helpful and interesting log file. There are multiple errors here, none of which I've ever seen. I'm going to try to dig into this over the weekend. If you happen to make any more attachments like this please send 'em along. :)
Just for cross-reference, bug #406284 has been opened as a result of the valgrind log in comment #43.
Created attachment 82263 [details] [review] test version of pan-tree.cc try this out on top of 0.123 and let me know. :)
Nope, it still segfaulted. I'll try to get a backtrace for you.
Damn! :) I am running out of suggestions, but we now know a couple of things we didn't know before: (1) valgrind indicates it's a null pointer dereference (2) it seems to be happening directly inside a gtk+ signal handler for row-deleted, rather than in a function called by the signal handler, and (3) the internal state of the Pan Tree doesn't seem to matter, because the version in comment #46 is slower but tightens up the internal state before each call to row-deleted. This is the most frustrating bug I've had in a very long time. Let's see. Now that we know these things, I think we can try out gdb again, so that things will be faster, but I have two requests: (2) install (most important) gtk2-2.8.20-debuginfo and (less important) glib-debuginfo-2.10.3. This will show us the exact line number of the crash in gtk's row-deleted handler. (2) also helpful, don't don't do a `make install' on Pan because that strips the debugging info from Pan. Just copy the file from pan-0.123/pan/gui/pan to /usr/bin (or wherever you keep it) by hand. We're in the home strech... I hope...
Backtrace below. I'm sorry, but I couldn't find a gtk-debug package for slackware and I don't really have time to recompile gtk, install, possibly mess some application and all that. Sorry. I hope this backtrace is enough somehow. Googling for "gtk_tree_view_set_model segmentation fault" returns a bunch of results, maybe some of them can help you to find the problem? (gdb) thread apply all bt
+ Trace 110098
Thread 1 (Thread -1219565888 (LWP 3563))
Just a note for Charles (Who is probably reading the bugs anyway ;) ) there is a discussion on bug 420618 about a crash that has some resemblance to the crash bug described here.
As Charles requested from bug #420618 (gdb) thread apply all bt
+ Trace 121524
Thread 5 (process 9160 thread 0x2903)
Thread 4 (process 9160 thread 0x2803)
Thread 3 (process 9160 thread 0x1903)
Thread 2 (process 9160 thread 0x1503)
FWIW, pan2 crashes on my machine stopped after I deleted my .pan2 dir and restarted pan2.
Version 126 Very reproducible crash. Try to open alt.binaries.amp Version 125 works fine. ------------------------------------------------------------------------------ GNU DDD 3.3.11 (x86_64-suse-linux), by Dorothea LUsing host libthread_db library "/lib64/libthread_db.so.1". (gdb) set args (gdb) run [Thread debugging using libthread_db enabled] [New Thread 47605559408048 (LWP 23528)] [New Thread 1082132800 (LWP 23531)] [New Thread 1090525504 (LWP 23532)] [New Thread 1098918208 (LWP 23533)] [New Thread 1107310912 (LWP 23534)] ** (pan:23528): CRITICAL **: static void PanTreeStore::sortable_set_sort_column_id(GtkTreeSortable*, gint, GtkSortType): assertion `tree->sort_info->count(sort_column_id) != 0' failed Program received signal SIGSEGV, Segmentation fault.
+ Trace 123599
Thread 47605559408048 (LWP 23528)
Ubuntu 6.10 (Edgy Eft) AMD64 Linux 2.6.17-11-generic x86_64 GNU/Linux libaspell15 0.60.4-4 libatk1.0-0 1.12.3-0ubuntu1 libc6 2.4-1ubuntu12.3 libglib2.0-0 2.12.4-0ubuntu1 libgtk2.0-0 2.10.6-0ubuntu3.1 libgtkspell0 2.0.10-3 libpango1.0-0 1.14.5-0ubuntu1 libpcre3 6.4-2ubuntu1 libxml2 2.6.26.dfsg-2ubuntu4 zlib1g 1:1.2.3-13ubuntu2 libpango1.0-common 1.14.5-0ubuntu1 libgmime2.1 2.1.19-0ubuntu2 libgnome2-0 2.16.0-0ubuntu1
+ Trace 123668
(In reply to comment #54) Additional information: Pan < 0.125 worked fine. Deleting .pan2 didn't help me.
I think I was wrong before in assuming these were the same bug. I think that the insert_sorted() bug is separate, so I'm moving discussion of it to bug #425993 . The good news is I think I've got it fixed.
This bug was reported against a version which is not supported any more. Developers are no longer working on this version so there will not be any bug fixes for it. Can you please check again if the issue you reported here still happens in a recent version of GNOME and update this report by adding a comment and adjusting the 'Version' field? Again thank you for reporting this and sorry that it could not be fixed for the version you originally used here. Without feedback this report will be closed as INCOMPLETE after 6 weeks.