After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 719880 - feather selection causes segmentation fault (SIGSEGV)
feather selection causes segmentation fault (SIGSEGV)
Status: RESOLVED FIXED
Product: GEGL
Classification: Other
Component: GeglBuffer
git master
Other All
: Normal critical
: ---
Assigned To: Default Gegl Component Owner
Default Gegl Component Owner
Depends on:
Blocks:
 
 
Reported: 2013-12-05 05:30 UTC by liam
Modified: 2014-03-17 02:05 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch to fix segmentation fault on a feather selection on a huge canvas. (5.78 KB, patch)
2013-12-06 13:05 UTC, Jehan
none Details | Review
New patch (5.13 KB, patch)
2013-12-07 08:48 UTC, Jehan
none Details | Review

Description liam 2013-12-05 05:30:43 UTC
To reproduce:

1. create a new image,
   8-bit colour
   4928x27213 pixels in size
   (I experimented a bit with other sizes, but I think it
    might be the image size modulo the tile size or something)

2. use the rectangle tool to make a selection

3. feather the selection (I used a radius of 100 but I think any value does it).



/home/lee/opt/bin/gimp-2.9: fatal error: Segmentation fault
/home/lee/opt/bin/gimp-2.9 (pid:14678): [E]xit, [H]alt, show [S]tack trace or [P]roceed: s
  • #0 waitpid
    from /lib64/libpthread.so.0
  • #1 g_on_error_stack_trace
  • #2 g_on_error_query
    from /lib64/libglib-2.0.so.0
  • #3 gimp_eek
  • #4 gimp_fatal_error
  • #5 gimp_sigfatal_handler
  • #6 <signal handler called>
  • #7 conv_rgbaF_linear_rgbAF_linear
  • #8 babl_conversion_linear_process
  • #9 babl_conversion_process
  • #10 process_conversion_path
  • #11 babl_fish_path_process
  • #12 babl_fish_process
  • #13 babl_process
  • #14 gegl_buffer_iterate_read_simple
  • #15 gegl_buffer_iterate_read_abyss_color
  • #16 gegl_buffer_iterate_read_dispatch
  • #17 gegl_buffer_get_unlocked
  • #18 iir_young_hor_blur
  • #19 process
  • #20 gegl_operation_filter_process
  • #21 gegl_graph_process
  • #22 gegl_eval_manager_apply
  • #23 gegl_node_apply_roi
  • #24 gegl_node_blit
  • #25 gimp_gegl_apply_operation
  • #26 gimp_gegl_apply_gaussian_blur
  • #27 gimp_channel_real_feather
  • #28 select_feather_callback
  • #29 size_query_box_response
  • #30 g_closure_invoke
    from /lib64/libgobject-2.0.so.0
  • #31 signal_emit_unlocked_R
  • #32 g_signal_emit_valist
  • #33 g_signal_emit
    from /lib64/libgobject-2.0.so.0
  • #34 _g_closure_invoke_va
  • #35 g_signal_emit_valist
  • #36 g_signal_emit
    from /lib64/libgobject-2.0.so.0
  • #37 gtk_real_button_released
  • #38 g_closure_invoke
    from /lib64/libgobject-2.0.so.0
  • #39 signal_emit_unlocked_R
  • #40 g_signal_emit_valist
  • #41 g_signal_emit
    from /lib64/libgobject-2.0.so.0
  • #42 gtk_button_button_release
  • #43 _gtk_marshal_BOOLEAN__BOXED
  • #44 g_closure_invoke
    from /lib64/libgobject-2.0.so.0
  • #45 signal_emit_unlocked_R
  • #46 g_signal_emit_valist
  • #47 g_signal_emit
    from /lib64/libgobject-2.0.so.0
  • #48 gtk_widget_event_internal
  • #49 gtk_propagate_event
  • #50 gtk_main_do_event
    from /lib64/libgtk-x11-2.0.so.0
  • #51 gdk_event_dispatch
    from /lib64/libgdk-x11-2.0.so.0
  • #52 g_main_context_dispatch
  • #53 g_main_context_iterate.isra.24
  • #54 g_main_loop_run
    from /lib64/libglib-2.0.so.0
  • #55 app_run
  • #56 main
    at main.c line 438

(script-fu:15328): LibGimpBase-WARNING **: script-fu: gimp_wire_read(): error
Comment 1 Jehan 2013-12-05 09:15:01 UTC
Just saying I can reproduce it every time by following the above procedure.
I'll see if I can find the issue.
Comment 2 Jehan 2013-12-05 12:21:00 UTC
It seems I've found the issue: an integer overflow.

I have a first easy fix already, but it probably would not handle all cases and could still overflow with even bigger images. I'll go to bed now and make a fix tomorrow!
Comment 3 Jehan 2013-12-06 13:05:38 UTC
Created attachment 263661 [details] [review]
Patch to fix segmentation fault on a feather selection on a huge canvas.

The issue is the automatic type promotion rules on arithmetic operations. "unsigned long int" has precedence over "long int", which has precedence over "unsigned int", which finally defaults to "int".

Problem is when you are doing arithmetic with pointers and int only. Intermediate computations with big integers may overflow MAXINT on too big numbers, then back to pointer types => pointers end up wrong. So we must convert int values to unsigned long before arithmetic operations.

That's what was happening here. In the gdb backtrace:

  • #0 conv_gaF_rgbaF
    at gggl-lies.c line 708
  • #1 babl_conversion_linear_process
    at babl-conversion.c line 302
  • #2 babl_conversion_process
    at babl-conversion.c line 413

The problem here was obviously the <Address 0x7ffeaa6dacf0 out of bounds>.
The temporarily allocated buffer for making a gaussian blur on our big canvas is so huge that it is bigger than G_MAXINT (2147483647 on my machine):
(gdb) p (gsize) src_rect->height * src_rect->width * 4 * 4
$11 = 2151864000
So obviously if we use int values to browse across this buffer, at some point, it breaks.
(gdb) p (int) 2151864000
$15 = -2143103296

So I fixed what I could see, in particular multiplication operands must be converted prior to the multiplication (because they are run before additions)! I assume that would do it for this specific issue (at least here I could rerun the procedure many times without a crash). But it raises questions on other pieces of code where we would similarly be using arithmetics with pointers and huge numbers with int. That would probably often happen on huge canvas processing.
When we do this, we should either use gulong from the start, or think about converting our int as soon as an operation may result in a big value when we have a big canvas.
Comment 4 Daniel Sabo 2013-12-06 19:04:13 UTC
Casting a pointer to an int and back is illegal; and this will still probably crash on 32 bit systems because allocating that much memory is problematic. I'm not sure if gimp should have chunked this by calling process() instead of node_blit() or if gaussian-blur should have some internal chunking.
Comment 5 Téo Mazars 2013-12-06 19:18:01 UTC
As said on IRC, my opinion is that gaussian-blur will handle large images flawlessly soon and that generally speaking, operations should not allocate too much memory. Some operations have still that flaw (like all blurs) and have to be fixed.
Comment 6 Jehan 2013-12-07 08:48:14 UTC
Created attachment 263705 [details] [review]
New patch

I went over the top with casting pointers. I removed all pointer conversions. As said before, the real problem was on intermediate int arithmetics anyway. See attached new patch.

Now you are right that it would probably still crash on 32 bits. But we should still do ulong computations instead of int when the finale goal is pointer arithmetics.

And I also agree that we allocate too much memory. I did not try to change the current logics, just fix existing code. Do we indeed really need to allocate a buffer the size of the canvas for bluring a selection (which may be very small!)? Currently the computation takes dozens of seconds on my computer!

And finally gaussian blur can definitely have some internal chunking. I was thinking it would be actually better when checking the code.
In operations/common/gaussian-blur.c, iir_young_hor_blur() will:
1/ allocate the buffer;
2/ gegl_buffer_get();
3/ do some buffer computation;
4/ gegl_buffer_set()
5/ free the temporary buffer.

We could definitely allocate a smaller buffer and get/mod/set piece by piece in a loop instead of all at once.
All this said, I believe the 3 changes should be done if possible: blur only the necessary part + chunk the processing + make intermediate computation with unsigned long integers.
They are not exclusive. This is why I propose this new patch as a first step.
Comment 7 Daniel Sabo 2013-12-07 09:40:17 UTC
I am not comfortable taking this patch; just going around casting things is not healthy and it ignores similar logic in buffer_get and the iterate_read functions. While I agree that accepting egregiously large buffers to buffer_set like this is a desirable feature, properly dealing with this needs more thought and is unrelated to this bug. Egregiously large buffers are WONTFIX for now.

The issue causing the crash here is that gaussian blur is trying to bite off a much larger chunk than GEGL can safely chew, even if you get it to safely allocate that g_new0 call it's probably hitting swap.
Comment 8 Téo Mazars 2013-12-07 11:43:58 UTC
> And I also agree that we allocate too much memory. I did not try to change the
> current logics, just fix existing code. Do we indeed really need to allocate a
> buffer the size of the canvas for bluring a selection (which may be very
> small!)? Currently the computation takes dozens of seconds on my computer!
> 
> And finally gaussian blur can definitely have some internal chunking. I was
> thinking it would be actually better when checking the code.
> In operations/common/gaussian-blur.c, iir_young_hor_blur() will:
> 1/ allocate the buffer;
> 2/ gegl_buffer_get();
> 3/ do some buffer computation;
> 4/ gegl_buffer_set()
> 5/ free the temporary buffer.
> 
> We could definitely allocate a smaller buffer and get/mod/set piece by piece in
> a loop instead of all at once.

As said, gaussian-blur is currently on flux, see Massimo's work in /operations/workshop/gaussian-blur-iir.c for details. The fir case is missing for small std-dev, and I am working on it. It will replace the current gaussian-blur in the next days if everything goes well. Fixing performances, memory allocation and artifacts issues.
Comment 9 Massimo 2013-12-07 14:34:34 UTC
diff --git a/gegl/buffer/gegl-buffer-access.c b/gegl/buffer/gegl-buffer-access.c
index cb90999..da2f10b 100644
--- a/gegl/buffer/gegl-buffer-access.c
+++ b/gegl/buffer/gegl-buffer-access.c
@@ -409,7 +409,7 @@ gegl_buffer_iterate_write (GeglBuffer          *buffer,
           guchar   *bp, *tile_base, *tp;
           GeglTile *tile;
 
-          bp = buf + bufy * buf_stride + bufx * bpx_size;
+          bp = buf + (gulong) bufy * buf_stride + (gulong) bufx * bpx_size;
 
On Win64 gulong (unsigned long) is a 32 bit unsigned integer,
so it would not fix the issue on Win64.

http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows

http://stackoverflow.com/questions/13335587/sizeof-unsigned-int-in-microsoft-x64-compiler
Comment 10 Daniel Sabo 2014-01-03 14:54:34 UTC

*** This bug has been marked as a duplicate of bug 721396 ***
Comment 11 Daniel Sabo 2014-03-16 22:31:58 UTC
commit 6d2312518a4624c5d72d2c33606afaf46beb660f
Author: Daniel Sabo <DanielSabo@gmail.com>
Date:   Sun Mar 16 00:00:20 2014 -0700

    gaussian-blur: Read one row/col at a time
    
    Only linearize one row/col of the buffer at a time, this should
    avoid the huge allocations that made it so unstable.
Comment 12 liam 2014-03-17 02:05:18 UTC
Confirmed that the test case I gave now works for me. Thanks!