Bug 340827 – crash when switching images with mouse wheel

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 340827 - crash when switching images with mouse wheel


Summary:	crash when switching images with mouse wheel


Status:	RESOLVED FIXED

Product:	eog
Classification:	Core
Component:	image viewer
Version:	2.14.x
Hardware:	Other All

Importance:	Normal critical
Target Milestone:	---
Assigned To:	EOG Maintainers
QA Contact:	EOG Maintainers

URL:
Whiteboard:

Duplicates:	325237 344166 345167 441586 (view as bug list)
Depends on:
Blocks:

Reported:	2006-05-06 13:06 UTC by Emmanuel Touzery
Modified:	2007-05-27 14:30 UTC

See Also:
GNOME target:	---
GNOME version:	2.13/2.14

Attachments
Patch to fix the image loading race condition (1.37 KB, patch) 2006-06-14 09:07 UTC, Callum McKenzie	none	Details \| Review
A complete fix for the race conditions in image-loading (1.33 KB, patch) 2006-06-15 07:25 UTC, Callum McKenzie	none	Details \| Review
The third version of the race-condition patch. (5.83 KB, patch) 2006-07-14 17:48 UTC, Callum McKenzie	none	Details \| Review
Yet another version. Fixes some extra problems. (7.01 KB, patch) 2006-07-17 15:49 UTC, Callum McKenzie	committed	Details \| Review

Description Emmanuel Touzery 2006-05-06 13:06:02 UTC

Steps to reproduce:
1. open image in a folder
2. switch images quickly with mouse wheel
3. crash after a while


Stack trace:
Backtrace was generated from '/usr/bin/eog'

(no debugging symbols found)
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
[New Thread -1224742144 (LWP 8392)]
[New Thread -1258984528 (LWP 8397)]
[New Thread -1226155088 (LWP 8393)]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
0xffffe410 in __kernel_vsyscall ()

+ Trace 68057

Thread 1 (Thread -1224742144 (LWP 8392))

#0 __kernel_vsyscall
#1 __waitpid_nocancel
from /lib/tls/i686/cmov/libpthread.so.0
#2 libgnomeui_module_info_get
from /usr/lib/libgnomeui-2.so.0
#3 <signal handler called>
#4 eog_image_cache_add
#5 eog_full_screen_get_type
#6 eog_job_call_finished
#7 eog_image_list_iter_equal
#8 g_child_watch_add
from /usr/lib/libglib-2.0.so.0
#9 g_main_context_dispatch
from /usr/lib/libglib-2.0.so.0
#10 g_main_context_check
from /usr/lib/libglib-2.0.so.0
#11 g_main_loop_run
from /usr/lib/libglib-2.0.so.0
#12 gtk_main
from /usr/lib/libgtk-x11-2.0.so.0
#13 main
#0 __kernel_vsyscall



Other information:
i don't remember if i had the list of images in the folder (F9) open. i just
played with it before but i think it was closed at the time of the crash.
this is ubuntu dapper up-to-date as of may 6, EOG 2.14.1.

Comment 1 Emmanuel Touzery 2006-05-06 13:39:51 UTC

OK, i checked more, and it's like that.
in EOG if you focus on the image and the "image collection" pane is not open, mouse wheel will zoom in/out.
however if you open the "image collection" pane and have the mouse on it and mousewheel then you'll switch images when mousewheeling.
that way you can switch images VERY fast. in a big directory with big images if you just mousewheel a second or so, EOG will be busy for a while (but i must say it's impressively fast, congrats for that!).

but after little time of this little game, it will crash. i already reproduced this crash 3 times, it's easy to trigger this way.

what's funny is that when the bug-buddy window about the crash appears, EOG still changes pictures in the background for maybe a second before the EOG window pops. surely due to threading.

Comment 2 Emmanuel Touzery 2006-05-06 13:58:07 UTC

ugh... i get a crash with the exact same stack if i run a slideshow with 1 second between images and exit it (before it would normally escape) with the ESCAPE key.
already got that other crash three times, i reproduce it like the first one 100% of the time.

i report it in this bug, since it's the same stack and it also relates to changing images quickly, it's obviously the same bug.

i hope the fix for those bugs will be released on time to be fixed for dapper, but the new EOG is otherwise great! the new UI is simply perfect.

Comment 3 Emmanuel Touzery 2006-05-06 14:06:15 UTC

related to slideshow exit, i can also reproduce with 2 and 3 and 4 seconds between images. i didn't try to go over 4 seconds, as it's really a lot then.
this bug is very bad... slideshow is unusable for me without crashing EOG... it's so bad, i even wonder, maybe it's just a crappy ubuntu package and upstream EOG is not affected!?

Comment 4 Fabio Bonelli 2006-05-07 09:43:45 UTC

Thanks for the bug report. This particular bug has already been reported into our bug tracking system, but please feel free to report any further bugs you find.


*** This bug has been marked as a duplicate of 320206 ***

Comment 5 Emmanuel Touzery 2006-05-07 09:47:46 UTC

well now that i looked a bit more at the code i'm not sure anymore it's a duplicate. what i saw by looking at the code is that the stack is completely wrong anyway (surely because it's multithread). so the fact that the stack is similar doesn't mean for sure it's the same bug.

but let's fix #320206 and if that doesn't fix that bug, we can reopen it. if that fixes it, then it was the same bug. but #320206 in the code is about full screen view (crash in ego-full-screen.c), and in this bug there is no full screen. so now that i looked at the code a bit more... i think they are probably different crashes. we'll see.

Comment 6 Claudio Saavedra 2006-05-19 11:34:16 UTC

As it is mentioned in comment #5, this bug is in no way related to bug #320206. That bug occurrs inside the EogFullScreen widget, and there are no instances of it during the segmentation fault.

An updated backtrace:

+ Trace 68313

Thread 3 (Thread -1508238416 (LWP 10158))

#0 _mcleanup
from /lib/tls/i686/cmov/libc.so.6
#1 jinit_d_main_controller
from /usr/lib/libjpeg.so.62
#2 jinit_phuff_decoder
from /usr/lib/libjpeg.so.62
#3 jpeg_read_scanlines
from /usr/lib/libjpeg.so.62
#4 gdk_pixbuf__jpeg_image_load_lines
at io-jpeg.c line 568
#5 gdk_pixbuf__jpeg_image_load_increment
#6 IA__gdk_pixbuf_loader_write
#7 eog_image_real_load
#8 eog_image_load
#9 job_image_load_action
#10 eog_job_call_action
#11 thread_start_func
#12 g_thread_create_proxy
at gthread.c line 582
#13 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#14 clone
from /lib/tls/i686/cmov/libc.so.6

Thread 2 (Thread -1497924688 (LWP 10157))

#0 _mcleanup
from /lib/tls/i686/cmov/libc.so.6
#1 jinit_d_main_controller
from /usr/lib/libjpeg.so.62
#2 jinit_phuff_decoder
from /usr/lib/libjpeg.so.62
#3 jpeg_read_scanlines
from /usr/lib/libjpeg.so.62
#4 gdk_pixbuf__jpeg_image_load_lines
at io-jpeg.c line 568
#5 gdk_pixbuf__jpeg_image_load_increment
at io-jpeg.c line 771
#6 IA__gdk_pixbuf_loader_write
at gdk-pixbuf-loader.c line 503
#7 eog_image_real_load
#8 eog_image_load
#9 job_image_load_action
#10 eog_job_call_action
#11 thread_start_func
#12 g_thread_create_proxy
at gthread.c line 582
#13 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#14 clone
from /lib/tls/i686/cmov/libc.so.6

Comment 7 Sebastien Bacher 2006-05-31 19:54:57 UTC

Ubuntu bug about that: https://launchpad.net/distros/ubuntu/+source/eog/+bug/44689

error and backtrace:

"** ERROR **: file eog-window.c: line 3061 (display_image_data): assertion failed : (eog_image_has_data (image, EOG_IMAGE_DATA_ALL))
aborting...

Program received signal SIGABRT, Aborted.

+ Trace 68568

Thread NaN (LWP 17451)

#0 __kernel_vsyscall
#1 raise
from /lib/tls/i686/cmov/libc.so.6
#2 abort
from /lib/tls/i686/cmov/libc.so.6
#3 IA__g_logv
#4 IA__g_log
at gmessages.c line 517
#5 IA__g_assert_warning
#6 display_image_data
at eog-window.c line 3061
#7 job_image_load_finished
at eog-window.c line 3120
#8 eog_job_call_finished
at eog-job.c line 441
#9 job_finished_cb
at eog-job-manager.c line 53
#10 g_idle_dispatch
at gmain.c line 3796
#11 IA__g_main_context_dispatch
at gmain.c line 1916
#12 g_main_context_iterate
at gmain.c line 2547
#13 IA__g_main_loop_run
at gmain.c line 2751
#14 IA__gtk_main
at gtkmain.c line 1026
#15 main
at main.c line 636

Comment 8 Claudio Saavedra 2006-06-07 15:30:39 UTC

*** Bug 344166 has been marked as a duplicate of this bug. ***

Comment 9 Callum McKenzie 2006-06-12 10:36:02 UTC

I can reproduce the stack trace in comments #6 and #7 with a directory of about 1000 files and scrolling in the collection pane using either the mouse or the keyboard (going back and forth seems to be helpful in triggering it). However I can also get this stack trace:

+ Trace 68788

#0 __kernel_vsyscall
#1 raise
from /lib/tls/i686/cmov/libc.so.6
#2 abort
from /lib/tls/i686/cmov/libc.so.6
#3 IA__g_logv
#4 IA__g_log
at gmessages.c line 517
#5 IA__g_assert_warning
at gmessages.c line 552
#6 eog_image_load
at eog-image.c line 727
#7 job_image_load_action
at eog-window.c line 3137
#8 eog_job_call_action
at eog-job.c line 386
#9 thread_start_func
at eog-job-manager.c line 85
#10 g_thread_create_proxy
at gthread.c line 582
#11 start_thread
from /lib/tls/i686/cmov/libpthread.so.0
#12 clone
from /lib/tls/i686/cmov/libc.so.6


Obviously a similar part of the code. In my case this appears more frequently by about a factor of five.

I am using CVS HEAD as of 2006/06/12. A similar crash seems to happen in 2.14 as well.

Comment 10 Callum McKenzie 2006-06-14 07:43:17 UTC

The crash in comments 6,7, and 9 is a race condition.

As you scroll through quickly you get ahead of the drawing and jobs back up in the job queue. Then if you start scrolling the other way (or click on an image that hasn't been rendered yet) you also add it to the queue (since the only check is to see if the image data is available, which it isn't yet because the first job hasn't executed yet). Now the first job executes, rendering the image and filling the priv->image structure. When the second job executes it finds the image already rendered and triggers an assertion (since priv->image != NULL). That's the cause of the trace in comment #9. I think 6 and 7 are the same cause, but a slightly different order of events.

Curiously, the race condition isn't between the threads running the jobs. It is within the job queue itself.

Hopefully I'll have a patch later tonight.

Comment 11 Callum McKenzie 2006-06-14 08:59:27 UTC

The stack trace in comments #6 and #7 is also due to the race condition: but in this case I suspect threads are implicated. One thread does the checks on an image then, before it can display it, the other thread deallocates the image (while running the other job).

Comment 12 Callum McKenzie 2006-06-14 09:07:08 UTC

Created attachment 67329 [details] [review]
Patch to fix the image loading race condition

This patch fixes the crash seen in comment #9, but not the crash caused by premature deallocation. I am not completely happy with it since I believe that there is still room for two threads to collide and cause the crash. However, the window of opportunity is a lot narrower and I am only getting the second class of race-condition crash now.

I think there needs to be some sort of image-global locking, but I'm not sure of the best way to do this.

Comment 13 Callum McKenzie 2006-06-15 07:25:28 UTC

Created attachment 67385 [details] [review]
A complete fix for the race conditions in image-loading

This patch fixes both cases. I am fairly confident that the code is also thread-safe.

Comment 14 Emmanuel Touzery 2006-07-08 14:32:33 UTC

it is a bit discouraging that after three weeks, on a crash bug marked critical, there is still no apparent reaction from maintainers from a patch by a GNOME developer :-(
but of course if it's fixed on time for 2.16 it doesn't matter exactly how long before. although it would be nice to have the fix in 2.14.x too.

PS: can someone mark this bug NEW? it's still UNCONFIRMED for now although many people confirmed it, and i don't have the privileges to mark it "NEW".

Comment 15 Sebastien Bacher 2006-07-08 14:43:19 UTC

what different does NEW make to you? Nobody seems to contest that's a bug ...

Comment 16 Emmanuel Touzery 2006-07-08 14:48:27 UTC

about NEW: i thought maybe (at least some) maintainers only check bugzilla for "NEW" bugs, ignoring "UNCONFIRMED" bugs. that seems sensitive for maintainers with lots of work. but i don't know, never been a maintainer of free software. from your comment, obviously i overestimated the important of that status.

Comment 17 Claudio Saavedra 2006-07-08 15:53:48 UTC

Callum,

(In reply to comment #13)
> Created an attachment (id=67385) [edit]
> A complete fix for the race conditions in image-loading
> 
> This patch fixes both cases. I am fairly confident that the code is also
> thread-safe.

With this patch I can still reproduce easily the crash and backtrace of comment #7, but only that one. The ones in comment #6 and #9 seem to be gone. Should this patch fix the crash in comment #7 too?

Comment 18 Callum McKenzie 2006-07-08 16:16:26 UTC

Claudio: It should stop the cause of #7. The problem went from easily caused (with the trace from #7) to un-reproducible on my machine and with my test set (a large group of images).

I suspect it can still be triggered when two threads try rendering the same image, but that case should be rare though since the window of opportunity is a lot smaller. I would think that you aren't seeing that.

What exactly are you doing to cause it? In my case it was scrolling forward with either the arrow keys or the mouse-wheel until the rendering lagged and then going back (sometimes going back and forward a few times was necessary).

I'll look into it.

Comment 19 Claudio Saavedra 2006-07-08 16:25:57 UTC

I scroll very fast between images, back and forth, in a very stressing way. If you use your mouse-wheel this is easily achievable. I think that doing that, it is probably easy to have two threads trying to render the same image.

Doing your test (scrolling forward until the rendering lagged and then going back)
I couldn't reproduce the bug. I suggest you to use the mouse wheel as fast as possible. 

I am sorry not to have much time to look deeply into this bug atm.

Comment 20 Callum McKenzie 2006-07-08 20:49:26 UTC

Claudio: Some quick questions: what is the typical number of images in the directory you are testing with? Also, are they all images, or are there other files in the mix?

Comment 21 Claudio Saavedra 2006-07-08 22:52:48 UTC

I have tested with several different use cases, including some images in a directory with many other files, a directory with only a few pictures, and a big collection. The result is always the same.

Try scrolling back and forth in a reduced range very fast. The goal should be to scroll between 2 or 3 images and not more, several times very fast. You will see the crash then.

(man, it would have been nice to talk about this in guadec)

Comment 22 Callum McKenzie 2006-07-09 09:18:27 UTC

I've finally got it to reproduce (but it takes a lot longer than it used to for me). Still trying to figure out whats wrong, but initial inspection shows all three threads working on different images so it probably isn't that.

Comment 23 Lucas Rocha 2006-07-13 17:47:58 UTC

*** Bug 345167 has been marked as a duplicate of this bug. ***

Comment 24 Lucas Rocha 2006-07-13 18:00:51 UTC

*** Bug 325237 has been marked as a duplicate of this bug. ***

Comment 25 Callum McKenzie 2006-07-14 17:48:35 UTC

Created attachment 68938 [details] [review]
The third version of the race-condition patch.

This new patch fixes several more problems. The main remaining bug (which happens a lot more frequently with optimisation turned on, this was why I wasn't seeing it initially) is thread-related and has been fixed by adding a mutex to the image. The remaining problem is a lot rarer and I didn't track down the exact cause (it involves the image data being freed too early). Instead I just put a check for the error and recovered gracefully if it ever occurred.

Ultimately the solution is to rewrite the image display code since currently it is a set of asynchronous jobs with the following bad properties:
 a) It doesn't guarantee the order the images will be displayed (i.e. sometimes
    the image displayed isn't the one selected because the threads finished
    in a different order to the way they started).
 b) It can't easily skip over images if the user has moved the selection 
    through an image quickly and doesn't actually want it displayed. This
    chews a *lot* of CPU time.
This is my justification for taking a short-cut on the final fix: the code is broken anyway. At least it shouldn't crash now (it goes at least a factor of 100 times longer without crashing, I can't guarantee another bug isn't lurking in there).

Comment 26 Callum McKenzie 2006-07-17 15:49:37 UTC

Created attachment 69051 [details] [review]
Yet another version. Fixes some extra problems.

This is a fourth version of the patch which also fixes some of the criticisms I outlined in the previous comment. It doesn't actually improve the solution, but it does solve the excessive CPU use, responsiveness and makes sure that the last image selected is the one actually shown.

It is an alternative to the third fix which is why I haven't marked it as obsolete.

Comment 27 Ernst Sjöstrand 2006-07-19 08:43:29 UTC

This patch seems to work very well for me, I haven't been able to make EOG crash with it!

Comment 28 Claudio Saavedra 2006-07-19 14:43:27 UTC

Attachment #69051 [details] by Callum fixes the issues. We coincide in the fact that the there is a broken design and the code should be rewritten anyway[1], therefore I am on the opinion of commiting it. Lucas, what do you think?

(I even think that the image locking API could help to fix bug #320206, but I am not really sure, I should take a look at the problem again).

[1] Actually, the job management was already rewrited by Lucas in the eog-ng branch, but this branch still needs some (a lot of) love before being good enough to merge it into HEAD.

Comment 29 Claudio Saavedra 2006-07-24 19:09:21 UTC

I have commited the attachment #69051 [details] to HEAD in order to make a release for GNOME 2.15.90. Will do the same for branch gnome-2-14 before the next stable release.

2006-07-24  Claudio Saavedra  <csaavedra@alumnos.utalca.cl>

        * libeog/eog-image-private.h:
        * libeog/eog-image.c: (eog_image_dispose), (eog_image_init),
        (eog_image_lock), (eog_image_unlock), (eog_image_cancel_load):
        * libeog/eog-image.h:
        * libeog/eog-job-manager.c: (thread_start_func):
        * libeog/eog-job.c: (eog_job_call_action):
        * shell/eog-window.c: (eog_window_init), (job_image_load_action),
        (job_image_load_finished), (handle_image_selection_changed):

        Add image_lock/unlock methods to ensure that there are no
        race conditions when loading and unloading images (Fixes bug
        #340827). Patch from Callum McKenzie <callum@spooky-possum.org>.

Comment 30 Lucas Rocha 2006-07-31 21:50:41 UTC

Applied with one fix: the error dialog for corrupted images (for example) wasn't being shown with this patch. Fixed this in HEAD too. Thanks!

Comment 31 palfrey 2007-05-27 14:30:51 UTC

*** Bug 441586 has been marked as a duplicate of this bug. ***