After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 87927 - Make script to find similar stack traces
Make script to find similar stack traces
Status: RESOLVED FIXED
Product: bugzilla.gnome.org
Classification: Infrastructure
Component: [obsolete] simple-dup-finder
unspecified
Other other
: High enhancement
: ---
Assigned To: Bugzilla Maintainers
Bugzilla Maintainers
: 327034 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2002-07-11 14:38 UTC by Ben FrantzDale
Modified: 2006-01-16 08:30 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Ben FrantzDale 2002-07-11 14:38:09 UTC
It would be good to have bugzilla automatically look for duplicate bugs by
comparing stack traces.

IMO:

First, stack traces would have to be identified; this shouldn't be too hard.
Next, the traces need to be identified in some way; two traces of the same
bug aren't going to be word-for-word identical.
Comment 1 Luis Villa 2002-07-11 15:50:42 UTC
Right. So- thoughts [feedback from the rest of bugmaster@ very welcome.]
*Doing this correctly and automatically will be Hard. I'd like to
start with a proof-of-concept CGI page, where people enter a bug
number or a stack trace and get back a list of potential duplicates.
*do we want to special case ()?? -type traces?
*how do we decide whether or not to 'test' something for duplication?
Just bug-buddy? bug-buddy + simple-bug-guide?
*Questions: if the algorithm gets 'sophisticated' enough to do
'probable dup' vs. 'really damn certain it is a dup' how do we want to
deal with those separate cases?

Thoughts on very, very sketchy regexp+algorithm:
1)strip out all lines not beginning with #
2)strip all lines before <signal handler called> [this is not the only
'keyword' here; I need to search/recall what the other flavors of this
are.]
3)maybe strip all #[*] 0x[*] leaving only function names? [Seems
useful- maybe we can store the stripped down, function-name only
version in the DB for quicker searching?][Maybe also strip everything
/past/ the first ( ?]
4) so now we've got only function names. We pull the first one and
search for it.[maybe we also strip/ignore really common top function
names?]
5) If we get a hit on the first one, compare second/third function
names. Match on all three is very likely a dup. [again, with exception
like gdk_x_error stuff.]
6) If any of the matches are still open/unconfirmed, we display only
those that are open, possibly 'weighting' for bugs with multiple
duplicates already added. If no open matches are found, we list
matches that were marked 'FIXED' first, then (maybe?) matches that
were closed as RESOLVED.

Flaws in the algorithm, so far:
*completely ignores/munges multi-threaded stack traces.
*obviously not robust to traces that are off-by-one, or cases where
very common functions are called near the top of a trace.

Other random thoughts:
*we'll definitely have to special case some things, like gdk_x_error,
that can occur across many apps.

cc'ing Ben Liblit on the off chance he has any insight he might want
to share; Ben, please feel free to ignore me :) 

Anyway, these ramblings are obviously incoherent/incomplete, but I
thought I'd get them down for the record quickly. If someone can cook
up a regexp for steps 1-4, I can whip up a test web page for steps 5
and 6 quickly. Otherwise it'll have to wait until I sit down with the
Camel Book, which might be a few days.
Comment 2 Ben FrantzDale 2002-07-11 16:39:24 UTC
Not knowing the internals of bugzilla, I'm going to assume that we
have the text of a bug stored in an array, @theBug.


my @functions = ();
my @files = ();
foreach my $line (@theBug) {
  if($line =~ /^\#(\d+)\s+0x[0-9a-fA-F]\s+in\s+(.*\(\))\s+from\s+(.*)$/) {
    push(@functions, $2);
    push(@files, $3);
  }
}

Given @functions from the above, and @functions from some other bug
report, one could diff the two to find similar bugs.
Comment 3 Ben Liblit 2002-07-11 17:51:12 UTC
The regular expression proposed by Ben FrantzDale will fail if GDB
wraps its output.  I've used the following pattern to good effect:

	/^\#\d+ 0x[0-9A-Fa-f]+ in (\w+) \(/

It only matches a prefix of each frame's description.  The prefix is
short enough to be unlikely to wrap, but long enough to capture the
function name and be unlikely to match anything that *isn't* a stack
frame.
Comment 4 Luis Villa 2002-07-11 21:38:52 UTC
killpg() is the other one I wanted to break/strip at, just to leave
this as a note for myself.
Comment 5 Luis Villa 2002-07-11 22:47:57 UTC
FWIW, I'm using Ben Liblit's expression for right now, except with 

#\d+ +0x

instead of
#\d+ 0x 

since it was ignoring #0...#9 as they have two spaces after the \d.

Work in progress (right now only working on perfecting the stripping)
at http://bugzilla.gnome.org/simple-dup-finder.cgi

Thanks a bunch for kicking me in the ass, Ben :) 
Comment 6 Luis Villa 2002-07-12 03:12:47 UTC
Eck. Bad, bad things. So, simple-dup-finder fairly robustly gets the
last five functions in a stack. All well and good. Search for the key
phrases, and you get a result. Depending on the exact magic SQL
incantation, said result takes about 4 minutes :/ So... I'm going to
work on the SQL, but I doubt I can make it much better. 

Options:
*have bug-buddy create/use other hidden fields to cache the results of
the stack trace parsing. Advantage: very, very fast. Disad: makes
upgrading more sucky :/
*Dump the data into the whiteboard instead of a custom set of fields?
This makes things very fast, right now, when almost all whiteboards
are empty. Disad: whiteboard is basically useless and off limits.
*Live with some very big query times, that would presumably be
happening fairly often.
*queue the parsing, and do it on a daily basis from cron, at some hour
where the fewest possible people will be inconvenienced.
*Someone with more SQL knowledge than me helps me speed up the query
some other way. I'm afraid ATM it's pretty much as straightforward and
simple as I can make it. [I'll commit it soon, in current form.]
Comment 7 Ben FrantzDale 2002-07-12 03:28:48 UTC
As for how often things get run, I was thinking that it would be
something run every day or so rather than at the time of creation of
each bug.

As for speeding up the searching, you could probably narrow the search
space dramatically by searching for all bugs that (1) might have a
stack trace in them (i.e., contain "#0") (2) match the right set of
function names and/or match the right set of filenames.


BTW: how do you use the duplicate finder you linked above?
Comment 8 Luis Villa 2002-07-12 03:59:57 UTC
I'm already searching on the function names; that's what takes so long :) 

Right now, all that does is generate a list of function names; I'm
then plugging those into a comment search on query.cgi, which works
reliably. I'm right now incorporating the query directly into that
page, but it'll be slow :) 
Comment 9 Luis Villa 2002-07-12 04:14:43 UTC
Oh! I realize why it didn't make sense to you :) Try this:
http://bugzilla.gnome.org/simple-bug-finder?bug_id=86839
Comment 11 Luis Villa 2002-07-12 16:59:00 UTC
Comment from chris lahey:
'might want to distinguish between no symbols and no bug found'
Probably also should distinguish between no symbols in trace and no trace.
Comment 12 Luis Villa 2002-07-12 17:19:59 UTC
Jody suggests doing version-based checking as well.
Comment 13 Ben FrantzDale 2002-07-12 19:50:08 UTC
Once this is working reasonably well, someone should do a quick script
(maby best done at a shell than over the web) to run this against
every bug in the database. I'm imagining output like:

bug xxxx has possible duplicates:
yyyy
zzzz
...

You'd probably want to search in an increasing order..
Once a bug has been found as a possible duplicate of another bug, it
probably shouldn't be searched against itself.


With that output, people could go through and (hopefully) do some mass
duplication marking.
Comment 14 Ben FrantzDale 2002-07-12 21:29:55 UTC
It breaks for bug 88015. That bug has what appears to be a normal
bug-buddy--generated trace, but the script doesn't see it.
Comment 15 Ben FrantzDale 2002-07-12 21:51:29 UTC
Here's a thought: If any of the duplicates found have known
duplicates, include the duplicate too. This would be particularly
useful for TRACKER bugs. TRACKER bugs don't have stack traces
generally, but the person running a dupe search should be aware that
there is a TRACKER for their bug.
Comment 16 Luis Villa 2002-07-12 22:08:07 UTC
The listing/tracking of duplicate numbers really can't be done in our
current DB[1]; it becomes much saner in 2.16 so I'll definitely want
to have that added when we upgrade.

[1] believe it or not, there is no field that keeps tracks of
duplicates in bugzilla pre-2.14. The only way you know X is a
duplicate of Y is by parsing all the text comments, which makes the
query really nasty.

Looking at 88015 right now.
Comment 17 Luis Villa 2002-07-12 22:16:52 UTC
Ah. 88015 gets ignored because it wasn't called from libgnomeui
handler- it's just a 'normal' back trace. I'm not entirely sure I want
to handle that case- we won't be 'automatically' parsing those from
bug-buddy anyway. At any rate... I'll look at some special case code
in there, but given the (crappy) way I wrote the code in the first
place, this'll be a little ugly. So it isn't a high priority.

Oh, and BTW, about parsing the DB, I definitely want to do that, if
for no other reason than to get some stats on things, figure out what
I might be missing, etc., etc.
Comment 18 Ben FrantzDale 2002-07-12 22:20:11 UTC
Here's a false positive, I think:
http://bugzilla.gnome.org/simple-dup-finder.cgi?bug_id=83738
finds both bug 71509 and bug 47920.

In general, that search looks like it finds a few different famlies of
duplicates. Some for nautilus and some for the CD player, among others.
Comment 19 Luis Villa 2002-07-12 23:07:32 UTC
Yeah, 83738 has a lot of bogus junk at the top that needs to be
filtered out; that's why it is catching all the bogus dups. Thanks for
recording that example, though; I'll need to test on things like that.
Comment 20 Luis Villa 2002-07-12 23:32:46 UTC
http://bugzilla.ximian.com/simple-dup-finder.cgi?bug_id=26103

18428 is a bogus dup of 26103.

So... I should look into making these an ordered regexp instead of a
series of SUBSTRs. It would probably be faster to boot.
Comment 21 Ben FrantzDale 2002-07-13 05:07:49 UTC
You should just be able to do "$f1.*$f2.*$f3.*$f4.*$f5", right? (where
that's the perl substring of the sql query.)
Comment 22 Luis Villa 2002-07-13 20:51:27 UTC
Ben: that's basically what I did last night before dinner; turns out
mysql regexp is /abysmally/ slow. :/ So that's not going to be a
usable solution.
Comment 23 John Fleck 2002-07-14 16:52:32 UTC
http://bugzilla.gnome.org/show_bug.cgi?id=88137
Has a stack trace, but gets the "no stack trace" error.
Comment 24 Luis Villa 2002-07-16 14:31:13 UTC
John: that works here; did you mistype the bug number (either into the
bug report or into the dup finder page?)
Comment 25 Ben FrantzDale 2002-07-16 17:17:04 UTC
It would be good to check attachments for stack traces as in bug
88362. At least this could be done for the bug we are checking
against, but ideally it would be done for the bugs that are searched.
Comment 26 Ben Liblit 2002-07-17 09:24:16 UTC
Ben FrantzDale: any particular reason you removed me from the Cc: list?
Comment 27 Luis Villa 2002-07-17 15:21:13 UTC
Oh, hrm... I should have checked that... I assumed you did, Ben L. I
assume it was just an error on Ben F.'s part?

FWIW, re: checking attachments: I'm really unlikely to do that; this
is going to mainly be for bug-buddy, first off, and secondly,
checking/querying attachments requires loading them into memory and
doing parsing of them- it can't be done directly in SQL. It would just
be very irritating to implement and not at all worth the slowdown and
waste involved.
Comment 28 Ben FrantzDale 2002-07-17 16:00:20 UTC
Ben L: That's odd. I certainly didn't mean to remove you from the CC
list. Looking at my emails, I can't even find the update when it happened.

As for attachments, If they are stored on disk as files rather than in
the DB, then yea, it wouldn't be worth it.
Comment 29 Ben FrantzDale 2002-07-17 19:36:22 UTC
The title of the search page should include the bug number we searched
for.

(Personally, I'd prefer if the number came as early as possible in the
title soas to fit in my galeon tabs easily. Perhaps "12345: possible
dups" would be clear enough?. If not, no matter.)
Comment 30 Andrew Sobala 2002-07-17 19:45:43 UTC
If you search for dups of a bug, sometimes you get 99,000 "RESOLVED"
bugs that are all (or mostly) duplicates of a single one. It would be
nice if the page was aware of this and highlighted the results like this:

UNCO The_bug_I_searched_for
UNCO }
UNCO } useful duplicates :-)
UNCO }
RESO Yada yada yada (400 duplicates found!)
RESO Blada blada blada (5 duplicates found!)
Comment 31 Ben FrantzDale 2002-07-21 22:07:28 UTC
It doesn't find the stack trace in bug 84528.
Comment 32 Andrew Sobala 2002-07-24 11:23:55 UTC
It doesn't find the stack trace in bug 88931
Comment 33 Luis Villa 2002-07-29 17:10:33 UTC
Lots of false gtk crap shows 88077 as a dup of 57250. Maybe all the
gtk_* and g stuff needs to be filtered.
Comment 34 Luis Villa 2002-07-29 17:15:19 UTC
As part of the 'confidence' score it should do a component/product check.
Comment 35 Dave Camp 2002-07-31 18:01:13 UTC
You should put a "Search for duplicates" link somewhere in the bug page 
Comment 36 Dave Camp 2002-07-31 18:48:39 UTC
while you're at it, the dup finder should allow you to enter an
arbitrary stack trace and search for existing dups.
Comment 37 Ben FrantzDale 2002-08-01 20:32:55 UTC
False positive: bug 89638.
Comment 38 Ben FrantzDale 2002-08-02 21:57:43 UTC
Broken:
http://bugzilla.gnome.org/simple-dup-finder.cgi?bug_id=89744

It finds these fuctions:
   1. PL_HandleEvent
   2. PL_ProcessPendingEvents
   3. event_processor_callback
   4. our_gdk_io_invoke
   5. g_io_unix_dispatch
yet that's starting at line #38.
Comment 39 Ben FrantzDale 2002-08-05 21:05:11 UTC
Bug 86746 confuses this because it's first function appears to be
named ".div".
Comment 40 Ben FrantzDale 2002-08-06 22:39:00 UTC
This doesn't see the trace in bug 87710. It appears to have "sigsuspend" .
Comment 41 Kjartan Maraas 2002-08-13 15:41:53 UTC
http://bugzilla.gnome.org/show_bug.cgi?id=59699

No symbols found.
Comment 42 Dave Camp 2002-08-16 14:41:05 UTC
change ?bug_id= to ?id= for consistency with show_bug.cgi.  that would
make it easier to switch between a bug and searching for a dup of it
by replacing show_bug with simple-dup-finder
Comment 43 Vincent Untz 2002-08-16 20:45:29 UTC
No stack symbols were found.
Bug #87927
Comment 44 Vincent Untz 2002-08-16 20:46:34 UTC
Sorry: I meant bug #87894 :)
Comment 45 Vincent Untz 2002-08-16 20:58:09 UTC
Ok. Really sorry : I forgot bug_id=.
Comment 46 Vincent Untz 2002-08-19 23:41:34 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?bug_id=65516
No stack symbols were found in bug 65516.
Comment 47 Andrew Sobala 2002-08-27 22:07:11 UTC
Doesn't find the trace in bug 91819
Comment 48 Andrew Sobala 2002-08-27 22:13:49 UTC
Doesn't find the trace in bug 91822
Comment 49 Luis Villa 2002-09-13 05:51:04 UTC
*I've fixed bug 88015 and family (we search for '(gdb) bt')
*I've fixed bug 88137 and family ('Backtrace was generated from %')
*not sure how to handle bug 84528; I guess I'll add in 'Debugging
Information'.
*88931 has nothing other than 'lots of pound signs'; not sure how to
handle that. Of course, maybe that's the best solution- shouldn't be
that time consuming to just do the regexp on all fields in the bug and
see what comes back.
*I've resolved dave's request (IRL) to use id= instead of bug_id=
*89744 seems to be broken because of the :: in the functions it
ignores. Ben, you wrote the regexp; think you can take a look at why
it is ignoring those?
*all the rest should be caught by earlier fixes in this list.

I'll probably open up a successor bug to deal with remaining issues.
Comment 50 Ben Liblit 2002-09-13 06:14:51 UTC
Assuming the regexp being used now is similar to the one I originally 
suggested, then "::" can be allowed by finding the part of the regexp 
that looks like this:

	(\w+)

and changing just that one part to this instead:

	((\w|::)+)

Note the additional set of parens, which might require adjustments to 
$2, $3, etc. depending on how the regular expression is being used.  
(I don't know where the script lives, so I can't check that myself.)
Comment 51 Luis Villa 2002-09-13 06:44:55 UTC
Wow, thanks, Ben, you rule :) That works like a charm.

I'm trying out the more intelligent approach to seeing if a bug has a
trace, but so far, it only works on some cases; I think I'm doing
something with the perl.
Comment 52 Luis Villa 2002-09-13 07:25:03 UTC
Ah, figured it out; there was a second failure point I wasn't thinking
about. I've caught that now. So, basically, everything with a stack
trace, except attachments, should now be caught in some form or
another. There is still the problem of bogus information in those
traces, of course. I'll probably poke at that for a few more hours
before heading home.
Comment 53 Luis Villa 2002-09-13 09:35:45 UTC
87710 still badly false-positives, as does 91822 to a lesser extent.
Everything else 'works' in the sense that we get reasonable meaningful
function names from them. Big thanks to all the people who helped
collect these examples, and who kept doing so after I'd neglected the
code for a month :)

So, the next step is robustness of the algorithm used to find and
identify duplicates. The current situation (where zillions of
'resolved' things can clutter the list) is... icky. Ben's first
proposed partial solution doesn't actually work with the current DB
(though it would with 2.16).

I need to stare at the code and experiment a bit, I think; I was going
to try to write out a plan but my brain is fried.
Comment 54 Luis Villa 2002-10-11 20:12:39 UTC

*** This bug has been marked as a duplicate of 95490 ***
Comment 55 Luis Villa 2002-10-11 20:13:05 UTC
Sigh. First change I make in bugzilla in like three weeks and it's
/wrong/.
Comment 56 Ben FrantzDale 2002-10-16 05:22:29 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=94495

This shows 94495 twice.
Comment 57 David Kennedy 2002-10-19 03:13:21 UTC
96177 appears to have a stack trace but simple_dup_finder doesn't
catch any.
Comment 58 Andrew Sobala 2002-10-23 22:41:43 UTC
No trace found in bug 60406
Comment 59 Ben FrantzDale 2002-10-24 22:10:13 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=61235 produces lots
of hits. The trace reads as:
   1. gtk_widget_event
   2. gtk_main_do_event
   3. gdk_event_dispatch
   4. g_main_dispatch
   5. g_main_iterate
Which isn't very helpful or unique.
Comment 60 David Fallon 2002-10-29 18:35:59 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=95402
results in 
97086, 94226, 95402

http://bugzilla.gnome.org/simple-dup-finder.cgi?id=97086
results in
97086, 94226

missing 95402.

http://bugzilla.gnome.org/simple-dup-finder.cgi?id=94226
results in
97086, 94226, 95402, 93830

http://bugzilla.gnome.org/simple-dup-finder.cgi?id=93830
results in hundreds of other random bugs, thus suggesting the function
calls are garbage.

 The function calls are different, as well:
95402:
   1. uri_matches_as_parent
   2. gnome_vfs_uri_is_parent
   3. nautilus_file_operations_copy_move
   4. icon_view_handle_uri_list
   5. nautilus_marshal_VOID__POINTER_INT_INT_INT

97086:
   1. uri_matches_as_parent
   2. gnome_vfs_uri_is_parent
   3. fm_directory_view_move_copy_items
   4. icon_view_handle_uri_list
   5. nautilus_marshal_VOID__POINTER_INT_INT_INT

94226:
   1. gnome_vfs_uri_is_parent
   2. icon_view_handle_uri_list
   3. nautilus_marshal_VOID__POINTER_INT_INT_INT
   4. g_closure_invoke
   5. signal_emit_unlocked_R

93830:
   1. __pthread_wait_for_restart_signal
   2. pthread_cond_wait
   3. poll
   4. __pthread_manager
   5. wait4

So, very confusing. these four all look to be duplicates by the
summaries/stack traces, but simple-dup-finder doesn't think so (some
of the time). :)
Comment 61 Andrew Sobala 2002-10-29 18:56:11 UTC
Now to come to think of it, there are a few situations where s-d-f
works one way but not the other.
Comment 62 David Fallon 2002-10-29 19:02:29 UTC
:) Yeah, and that's no good. A being a duplicate of B definately
implies B is a duplicate of A.
Comment 63 Andrew Sobala 2002-10-31 18:41:53 UTC
A text box. Wouldn't that be nice.

To have a text box to paste a stack trace into to do a quick dup-check
without first having to file a bug then run s-d-f on it. It also means
you can point people towards it on #gnome if they ask if their bug's
already filed - in fact, someone just asked that ;-)
Comment 64 David Fallon 2002-10-31 19:14:46 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=97358

shows 73963 twice in the list.
Comment 65 Andrew Sobala 2002-11-03 15:37:05 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=97526 does not
report bug 97165.
Comment 66 Andrew Sobala 2002-11-17 19:16:00 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=98794 lists bug
97883 twice (presumably because it has 2 stack traces)
Comment 67 David Kennedy 2002-12-04 05:16:27 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=100291 results in a
massive number of duplicates (if no good function names are found,
perhaps we shouldn't search?)
Comment 68 Andrew Sobala 2002-12-30 22:59:57 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=102245

The stack frame starting gnome_window_manager_get_settings is ignored
Comment 69 Ben FrantzDale 2003-01-15 06:52:24 UTC
This should at least hit itself: bug 10191.
Comment 70 Vincent Untz 2003-03-15 13:43:38 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=108417

Bug 106166 is found twice.
Comment 71 Crispin Flowerday (not receiving bugmail) 2003-08-27 07:58:40 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=120771

The first 2 stack frames are ignored:

GtkPromptService::GetGtkWindowForDOMWindow(nsIDOMWindow*)
GtkPromptService::Confirm(nsIDOMWindow*, unsigned
short const*, unsigned short const*, int*)
Comment 72 Andrew Sobala 2003-09-08 21:32:34 UTC
Doesn't see any frames in bug 121719
Comment 73 Martin Wehner 2004-01-12 20:05:25 UTC
No stack symbols were found in bug 131243.
Comment 74 Arvind S N 2004-01-23 12:24:33 UTC
Apologies if it's been already mentioned here, have not read it
closely. :)

When we look for dups, would it be possible to get only the bug which
has been marked Resolved and Fixed (i.e, if one exists) instead of 
listing all the bugs. Or we could have "Bugid" "Status" "Resolution"
for all the bugs. Atleast that would help in finding the right one
faster. :)
Comment 75 Andrew Sobala 2004-01-23 13:18:41 UTC
Well, some bugs accumulate a lot of dups when they are still open.

In 2.16 (cough) I believe it is quite trivial to sort by the number of
duplicates. (Yippee.) Of course, we are not quite running 2.16 yet :)
Comment 76 Luis Villa 2004-02-12 18:50:56 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=121734 <- this
really sucks, we should fix it. [Whenever I quit my job to become the
bugzilla guy again ;)
Comment 77 Luis Villa 2004-02-12 22:43:58 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=125523 also broken
Comment 78 Luis Villa 2004-02-24 18:38:19 UTC
http://bugzilla.gnome.org/simple-dup-finder.cgi?id=135288

This misses the first function in the trace
(nautilus_desktop_link_get_link_type), possibly because it has no
memory address before it?
Comment 79 Luis Villa 2004-02-25 22:36:51 UTC
Bug 135416 should find 128424, not sure why it doesn't.
Comment 80 Christian Kirbach 2005-03-06 18:02:51 UTC
Bug 169409 does not find itself.

http://bugzilla.gnome.org/dupfinder/simple-dup-finder.cgi?id=169409
Comment 81 Elijah Newren 2005-03-06 21:31:57 UTC
That's because it hits the limit of 100 I added (for some bugs it would show so
many possible duplicates (useless stack trace) that it would take several
minutes to show the report so I artificially cut it off at 100.  We should add a
comment about that...
Comment 82 Kjartan Maraas 2005-04-17 19:41:56 UTC
bug 300983 doesn't find itself
Comment 83 Olav Vitters 2005-04-17 20:05:08 UTC
Kjartan: That is because simple-dup-finder has been limited to 100 results
maximum (see comment 81). I added a warning when it returns 100 results.
Comment 84 Sebastien Bacher 2005-06-11 20:32:51 UTC
Bug #303466 does not list itself
Comment 85 Elijah Newren 2005-06-11 20:45:40 UTC
It appears the problem is that the simple dup finder trying to get functions
from two separate stack traces for 303466...interesting.  Too bad I don't have
much time to look further at the moment--someone please ping me in two to three
weeks if no one else has taken a look.
Comment 86 Olav Vitters 2005-06-11 20:52:03 UTC
Comment 84: Cause is simple-dup-finder taking the functions from multiple
comments and matching them per comment. It should take them from the comment
with the best stacktrace and not multiple ones.

Easy fix is to limit it to the first comment with functions in it. The SQL
already has a regex selects those. Adding a 'limit 1' would fix it.

I think I'll add a "ORDER BY bug_when DESC LIMIT 1". That will fetch the
functions from the newest stacktrace comment (that should be the best one...
providing the newest comment always has the best stacktrace).

grmbl.. Elijah is too fast
Comment 87 Olav Vitters 2005-06-11 21:11:09 UTC
Made the change, bug #303466 finds itself again.
Comment 88 Christian Kirbach 2005-06-19 23:02:44 UTC
the function calls from Bug 306582 are not extracted properly.
every odd function name is emited.

http://bugzilla.gnome.org/dupfinder/simple-dup-finder.cgi?id=306582
Comment 89 Hans Breuer 2005-06-25 11:01:47 UTC
One of the very familiar dups of Dia does not get found due
to the genertated IA__ prefix. Removing that manually from 
the stack trace makes it work again. 
Maybe that prefix should be stripped by the dup-finder script?

Dup: bug #308678
Orig: bug #161603

Comment 90 Elijah Newren 2005-06-25 16:52:33 UTC
Yeah, it really should.  I tend to use boogle to manually do that right now, and
was the main reason for adding the link to the boogle search on the
simple-dup-finder page (though I use it for other things as well...)
Comment 91 Christian Kirbach 2005-07-17 15:31:00 UTC
Consider Bug #169193 .

the trace contains
================
  • #2 <signal handler called>
  • #3 gtk_window_move
    from /usr/lib/libgtk-x11-2.0.so.0
  • #4 terminal_screen_get_text_selected
  • #5 terminal_screen_get_text_selected
  • #6 gtk_list_store_remove
    from /usr/lib/libgtk-x11-2.0.so.0
  • #7 ??
  • #8 ??
  • #9 ??
  • #10 g_type_check_class_cast
    from /usr/lib/libgobject-2.0.so.0

I believe simple-dup-finder should
a) warn that we have missing symbols within the first 5 frames
b) abort collecting function names as soon as it hits missing symbols, i.e. only
  extract four function names from the above trace. What it currently does is 
  pick up five function names, regardless of unknown function names it 
  encounters on its way.
Comment 92 Christian Kirbach 2005-07-18 11:13:27 UTC
Suggestion: Put the warning

"Warning: Number of bugs has been limited to 100."

above the duplicate listing.
Comment 93 Elijah Newren 2005-07-18 16:39:05 UTC
It's not too infrequent that we get crap stack traces and get the same ones over
and over--and the dupfinder helpfully points out the potential duplicates. 
There are a number of traces like this where if we only extra functions from the
beginning of the trace then we won't get any and won't be notified about
duplicates.  But I put the boogle link inside the simple-dup-finder output for
exactly this reason (though it also allows refining the search in other ways too).

Putting the warning at the top shouldn't be real hard, though it means slurping
in the outputs of the SQL query into memory, counting them, then doing output,
instead of output things on the fly and counting them as we go and then
displaying a warning if the count happens to be 100.  If you'd like to look into
fixing this, just look at bugzilla-new/dupfinder/simple-dup-finder.cgi and
bugzilla-new/dupfinder/find-traces.pl.  Neither is very long.
Comment 94 Christian Kirbach 2005-07-18 20:26:30 UTC
Agreed - we get many useless traces.

I still suggest missing symbols should not be ignored but taken into account, i.
e. it should extract "??" as function names.
This way it is still possible to detect duplicates of useless traces.
Additionally, if time and manpower permit, we could extend s-d-f to scan for 
duplicates in a smart way so that any function passes the match test if the
template trace has a missing symbol. For instance, the example in Comment #91 

s-d-f should IMHO extract

1 gtk_window_move 
2 terminal_screen_get_text_selected
3 terminal_screen_get_text_selected
4 gtk_list_store_remove
5 ??

when scanning for duplicates any function name should pass the matching test for 
the missing symbol. This way crap trace _and_ eventually non-missing-symbols 
traces will be detected.
Comment 95 Elijah Newren 2005-07-18 20:47:13 UTC
Your suggestion defintely has merit and I can see that it'd be useful in many
cases, but it's also an example where we are just trading off which cases s-d-f
is most useful in.  With your scheme, if one stack trace of a bug is missing
some symbols and another isn't then you can't detect they are duplicates with
s-d-f (sometimes you may not be able to anyway, but I've found on many occasions
that I can).  Also, I have found dupes with s-d-f where your scheme would just
extract 5 ??'s, which isn't useful (yes, such stack traces usually aren't at all
trustworthy but if there are enough functions then s-d-f can sometimes give a
small number of bugs to check and I can verify that they're dups by quickly
reading the descriptions of each).

It's hard to tell which choice will provide the best productivity with the tool.
 Both are still possible regardless of the choice because of the boogle link,
and and my basic feeling right now is that it's easier to delete function names
you don't want to search on from a boogle query than to try and add them.
Comment 96 Christian Kirbach 2005-07-19 13:21:51 UTC
I can stick with boogle for the time being.
Comment 97 Christian Kirbach 2005-10-05 08:41:37 UTC
bug #317935 : function names get collected across threads.
Is that intended?
Comment 98 Olav Vitters 2005-11-30 17:48:52 UTC
We should blacklist the function:
  libgnomeui_segv_handle

possibly others as well
Comment 99 Christian Kirbach 2005-12-20 14:30:41 UTC
Consider http://bugzilla.gnome.org/show_bug.cgi?id=324448

two frames bear the same name, but s-d-f extracts e_cal_backend_http_get_type
only once...
Comment 100 Olav Vitters 2005-12-20 17:55:26 UTC
Yeah, simple-dup-finder is not smart, it just lists bugs which have all these 'words' in one comment. It avoids the same function twice on purpose.
Comment 101 André Klapper 2006-01-10 12:15:36 UTC
bug 326345 got a stacktrace, but simple-dup-finder claims that there is none.
Comment 102 Olav Vitters 2006-01-10 12:45:29 UTC
(In reply to comment #101)
> bug 326345 got a stacktrace, but simple-dup-finder claims that there is none.

Simple-dup-finder tries to get the newest stacktrace using SQL. After that it uses  perl to actually parse the comment. Because the user quoted the *entire* description (aargh!!), the SQL sees the stacktrace but the perl code doesn't accept it (because of the '> ' at the beginning of the lines). Have to enhance the  SQL to find the correct one. Plus make the reply option not quote anything if the comment is over a certain length (grr).
Comment 103 Elijah Newren 2006-01-15 04:26:32 UTC
*** Bug 327034 has been marked as a duplicate of this bug. ***
Comment 104 Luis Villa 2006-01-15 18:22:51 UTC
Should we break this bug up? It has outlived the original purpose, probably...
Comment 105 Karsten Bräckelmann 2006-01-15 19:09:44 UTC
No Simple Dup Finder available for bug 326478, although it has a stacktrace.
Comment 106 Elijah Newren 2006-01-15 23:55:32 UTC
+1 from me for closing this bug and just opening new ones for any new (or remaining) issues.  

Karsten: That's because bugzilla currently only shows helpful stuff (triage links, simple-dup-finder link) when bugs are unconfirmed; it's a separate issue from this bug anyway (since the simple-dup-finder does find the stack trace in that bug).
Comment 107 Olav Vitters 2006-01-16 08:30:56 UTC
Closing. Created simple-dup-finder component. Changed the report to mention a new bugreport has to be created.