GNOME Bugzilla – Bug 342038
aisleriot: crash in four seasons
Last modified: 2012-01-31 23:20:36 UTC
Distribution: Ubuntu 6.06 (dapper) Package: gnome-games Severity: Normal Version: GNOME2.14.1 unspecified Gnome-Distributor: Ubuntu Synopsis: aisleriot: crash in four seasons Bugzilla-Product: gnome-games Bugzilla-Component: aisleriot Bugzilla-Version: unspecified Description: I was trying to drag something and it crashed. Bug buddy opened and it has some debug info. Included file "/tmp/arcrashZbCQG0": Variation: Royal East Seed: 2725617782 Scheme error: (car Wrong type argument in position ~A: ~S (1 #<freed cell 0x76f36c98; GC missed a reference>) #f) Scheme tag: wrong-type-arg Backtrace: In unknown file: ?: 0* [droppable? 6 #<freed cell 0x76f36c98; GC missed a reference> 3] In /home/meh/opt/share/sol-games/royal_east.scm: 99: 1* (cond ((or # # # ...) (and #)) ((> end-slot 2) (and # #)) (#t #f)) 113: 2 (and (not #) (or # # #)) ... 117: 3 [= 2 ... 118: 4* [+ 1 ... 118: 5* [get-value ... 118: 6* [car #<freed cell 0x76f36c98; GC missed a reference>] Deck State: Slot 0 (0 12 #t), (3 12 #t) Slot 1 (1 9 #f), (3 1 #f), (3 10 #f), (1 10 #f), (2 2 #f) (2 6 #f), (0 10 #f), (1 7 #f), (2 4 #f), (3 6 #f) (2 8 #f), (1 8 #f), (0 2 #f), (2 10 #f), (2 9 #f) (0 9 #f), (3 7 #f), (2 5 #f), (0 7 #f), (3 3 #f) (0 8 #f), (3 9 #f), (3 8 #f) Slot 2 (2 11 #f), (2 12 #f), (2 13 #f), (2 1 #f) Slot 3 (2 7 #f), (0 6 #f), (3 5 #f), (3 4 #f), (2 3 #f) (3 2 #f) Slot 4 (0 11 #f) Slot 5 (0 5 #f), (0 4 #f), (0 3 #f) Slot 6 (Empty) Slot 7 (3 13 #f) Slot 8 (1 11 #f), (1 12 #f), (1 13 #f), (1 1 #f), (1 2 #f) (1 3 #f), (1 4 #f), (1 5 #f), (1 6 #f) Slot 9 (0 13 #f) Slot 10 (3 11 #f) ------- Bug created by bug-buddy at 2006-05-16 21:40 -------
I can definitely confirm this bug. I had thought that it was being caused by some changes I'd made in 2.15, but obviously this isn't the case. Still no solution yet, but I suspect it is because we aren't referencing the card drag data on the C side of the code and it's getting garbage-collected prematurely.
I think I have fixed this. The problem can be traced back to the ratehr cavalier attitude we have to throwing cards back and forth between scheme and C data structures. Rather than keeping all the cards in lists solely on the scheme side (we currently keep parallel lists on both) I have explicitly protected the cards against garbage collection on the scheme side. This should work, but the bug is very hard to trigger (and to an extent compiler dependant) so I'm not absolutely sure. The fix has been applied to both CVS HEAD and 2.14 branches.
Sorry, it's still crashing. Included file "/tmp/arcrashy0IppQ": Variation: Royal East Seed: 1470519357 Scheme error: (car Wrong type argument in position ~A: ~S (1 #<freed cell 0x76eb1250; GC missed a reference>) #f) Scheme tag: wrong-type-arg Backtrace: In unknown file: ?: 0* [droppable? 1 #<freed cell 0x76eb1250; GC missed a reference> 2] In /home/meh/opt/share/sol-games/royal_east.scm: 99: 1* (cond ((or # # # ...) (and #)) ((> end-slot 2) (and # #)) (#t #f)) 103: 2 (and (or (and # #) (and # # #))) ... 105: 3 (and (not (empty-slot? end-slot)) (eq? (get-suit #) (get-suit #)) ...) 106: 4* [eq? 1 ... 107: 5* [get-suit ... 107: 6* [car #<freed cell 0x76eb1250; GC missed a reference>] Deck State: Slot 0 (2 11 #t), (2 1 #t), (0 6 #t), (3 5 #t), (1 4 #t) (3 12 #t), (0 5 #t), (2 12 #t), (3 6 #t), (1 11 #t) (0 3 #t), (3 9 #t), (2 4 #t), (3 3 #t), (0 8 #t) (0 9 #t), (0 2 #t), (1 3 #t), (2 2 #t), (1 5 #t) (1 1 #t), (3 11 #t), (1 13 #t) Slot 1 (3 4 #f), (3 8 #f), (3 10 #f), (3 2 #f), (3 7 #f) (0 4 #f), (0 13 #f), (2 13 #f), (2 5 #f) Slot 2 (1 6 #f), (1 7 #f), (1 8 #f), (1 9 #f), (1 10 #f) Slot 3 (Empty) Slot 4 (2 6 #f), (2 7 #f), (2 8 #f), (2 9 #f), (2 10 #f) Slot 5 (2 3 #f), (1 2 #f), (3 1 #f), (3 13 #f), (0 12 #f) (0 11 #f), (0 10 #f) Slot 6 (0 1 #f) Slot 7 (Empty) Slot 8 (Empty) Slot 9 (0 7 #f) Slot 10 (Empty)
Created attachment 65953 [details] screenshot of crash The queen looks like it's exactly on the edge of the slot at the moment of the crash. That might mean something..
Are you absolutely certain you are running the new sol.scm (i.e. a make install has been done and all that) ? If that is the case then I don't understand the guile documentation properly because that data is meant to be explicitly protected against garbage collection. The edge-of-slot thing does mean something, but nothing more than the stack trace has already told us: The crash occurs when we try and reference the dragged cards to see if they can be dropped on a slot: before we're over a slot we don't bother looking at them because we know they can't be dropped onto an empty space.
Yes, but I was having some problems with my clock at about the time I was rebuilding gnome-games. I don't think it could cause a problem, but I'll rebuild everything (make clean && make && make install) to be sure.
Never mind, it seems fine now.
Wait, no, it crashed again.
The phrase I use in situations like this is "bugger". I'll have a look at what I wrote and see if it is working the way I think it is. Failing that I'll move all the list handling over to the scheme side where it belongs.
I see you've committed something recently that looks like it might be related. FWIW, it hasn't crashed since that change.
Now it has.
*** Bug 345215 has been marked as a duplicate of this bug. ***
Vincent: The "edge of slot" thing just means that bug is getting triggered whenever droppable is true. :)
Vincent: The first stack trace... Was that a CVS head or unstable (2.15) version of gnome-games you were running at that time? (I'm now thinking this may be rooted in changes brought through #338386, though there's still a lot of code and changes for me to dig though.)
That was cvs head.
*** Bug 345916 has been marked as a duplicate of this bug. ***
To anyone wondering, just a note: Callum said on the mailing list that he would be able to get to this in about a week from now. Should be well in time for the beta 1 freeze.
What's kinda wierd about this bug is that, according to bug #347679, the parameters we are passing to bug-buddy shouldn't have generated the stack trace from the reporter. I wonder if Ubuntu has patched our bug-buddy usage and not told us ...
They haven't.
I should mention that I have a different version of bug-buddy than the reporter of bug 347679. I'm using 2.14.0; he's using 2.15.0.
I think I'm using a guardian for something which it isn't intended for, which might explain why the code keeps crsshing. As an alternative, I've written a new patch which explicitly saves a copy of the cards in an auxilary list. This approach has some deficiencies: one being that it leaks about 1k each time the game is played (the guardian approach probably did too) and is less than elegant. On the other hand it doesn't reply on my dodgy understanding of obscure guile functions. The patch will be coming shortly.
Created attachment 69406 [details] [review] The new and improved patch Like the previous code I can't actually say whether this fixes the problem since I'm not seeing it. However it has been tested to make sure it doesn't cause any obvious crashes. This time I even checked that you could change the game without crashing.
Here is a crash I got when playing Klondike with the latest patch from Callum: Included file "/tmp/arcrashvskrCC": Variation: Klondike Seed: 228694687 Scheme error: (reverse Wrong type argument in position ~A: ~S (1 #<freed cell 0xb6cc48f0; GC missed a reference>) #f) Scheme tag: wrong-type-arg Backtrace: In unknown file: ?: 0* [droppable? 1 #<freed cell 0xb6cc48f0; GC missed a reference> 9] In /usr/local/share/sol-games/klondike.scm: 111: 1* (and (not (= start-slot end-slot)) (or (and # #) (and # # #))) 112: 2 (or (and # #) (and # # #)) 112: 3* (and (member end-slot tableau) (if (empty-slot? end-slot) (= king #) ...)) 113: 4 (if (empty-slot? end-slot) (= king (get-value (car #))) ...) 115: 5 (and (not (eq? # #)) (= (get-value #) (+ # 1))) 115: 6* [not ... 115: 7* [eq? #f ... 116: 8* [is-red? ... 116: 9* [car ... 116: 10* [reverse #<freed cell 0xb6cc48f0; GC missed a reference>] Deck State: Slot 0 (3 10 #t), (2 12 #t), (1 9 #t), (1 10 #t), (2 6 #t) (2 8 #t), (0 9 #t), (1 6 #t), (1 7 #t), (0 7 #t) (2 13 #t), (1 8 #t), (0 6 #t), (0 10 #t), (2 3 #t) (2 5 #t), (3 6 #t), (1 12 #t) Slot 1 (0 5 #f), (2 2 #f), (1 1 #f), (0 3 #f) Slot 2 (Empty) Slot 3 (Empty) Slot 4 (2 1 #f) Slot 5 (Empty) Slot 6 (1 2 #f) Slot 7 (3 8 #f), (2 7 #f) Slot 8 (0 13 #t), (2 4 #t), (3 5 #f) Slot 9 (3 4 #t), (3 1 #t), (1 4 #t), (0 4 #f) Slot 10 (2 10 #t), (2 9 #t), (3 2 #t), (3 3 #t), (3 9 #f) Slot 11 (3 13 #t), (0 2 #t), (1 5 #t), (0 12 #t), (3 11 #f) Slot 12 (0 1 #t), (2 11 #t), (1 3 #t), (0 11 #t), (0 8 #t) (1 13 #f), (3 12 #f), (1 11 #f)
Never mind, I forgot to make install, the patch works. I will committ it to CVS. Thanks Callum.
Could https://launchpad.net/distros/ubuntu/+source/gnome-games/+bug/54151 be the same issue?
Daniel: Yes, I'd mark that as a duplicate.
Thanks.
*** Bug 349034 has been marked as a duplicate of this bug. ***
Ok, it seems that the patch didn't fix this bug, after all. I'm able to reproduce the crash in Bug #349034, which is the same as this one. Fixing this should be the #1 priority for the next version, and at this time I think we should revert the code.
I am hereby totally mystified about why the cards are getting garbage-collected. They _should_ all be referenced by a top-level variable when they are created. Having said that, I have had a great deal of difficulty reproducing this bug, so I've never been certain of my fixes. As for reverting code, I'm not sure there is any code to revert (I know I've mentioned reverting code in the past, but I was getting confused with anoher bug). I think that this bug is appearing now for one of two reasons: changes in compilers or guile mean that guile isn't tracking the ownership of the cards between scheme and C. Alternatively some optimisation work in cscmi.c made earlier this cycle has triggered the bug. The bug however has always been there. When the cards get dragged the list of cards is kept purely on the C side of the code - hence there is an opportunity for them to get garbage collected. guile has some mechanisms for tracking its objects on the C side, but our code doesn't fall into that category (since the objects aren't kept on the stack, they're put in a list). There are several solutions: a) change the dragging code so a list of the dragged cards is kept on the scheme side. b) change the C code so that the cells get found by guile's garbage collector and counted as references. c) Refence every card in a separate place so we don't lose them. a) is feasible. b) is prone to breaking again in the future. c) should be trivial - which is why I implemented it. I will look at this again, but I'm really running out of ideas.
Thanks for looking at this again, Callum. Since I am able to reproduce the crash, then perhaps you could try to make a patch with either of the solutions you prefer, and I will give it a thoughough test.
Callum, do you think that you will be able to post a solution before 2.16.0 Beta 2? I have been looking at this bug, but not been able to fix it yet. I've been thinking about using some of the garbage collection functions for Guile here: http://www.gnu.org/software/guile/docs/docs-1.8/guile-ref/Garbage-Collection-Functions.html It should be possible to protect the cards from garbage collection when they are dragged, using those functions.
I think I've located the problem: C code that is creating cards for the scheme side. I haven't got a fix yet, but it shouldn't be too far away. If it is the problem, Beta 2 looks quite doable.
Created attachment 70240 [details] [review] A new approach to the problem. Here is the new patch. I have a lot more confidence in it since I now understand what I'm seeing in the stack traces. I looked at every point where we call scheme code and provide a C generated list and explicitly protect that for the duration of the call. All other objects we're passing from the C code should be automatically seen as referenced by the scheme code (I hope). In other changes: I removed the explicit protection of the cards on the scheme side since that should be unecessary now and it leaks memory. I also rearranged some of the code, this isn't strictly necessary but makes things cleaner (the rearrangement was necessary for my first cut of the patch). See if this fixes things. I still can't duplicate the crash easily, so I'm just making educated guesses.
Great work on putting this bug to an end, Callum. I have tested Aisleriot pretty heavily, and not been able to reproduce a crash. Therefore I committed the patch. Now, let's hope that Aisleriot has stabilized for a while...
*** Bug 436345 has been marked as a duplicate of this bug. ***
This bug is being reassigned to the "general" component so we can close the aisleriot bugzilla component. Apologies for the mass email!