Bug 644505 – gfvsd-smb dies when fetching many small files from remote [patch]

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 644505 - gfvsd-smb dies when fetching many small files from remote [patch]


Summary:	gfvsd-smb dies when fetching many small files from remote [patch]


Status:	RESOLVED FIXED

Product:	gvfs
Classification:	Core
Component:	smb backend
Version:	1.6.x
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	---
Assigned To:	Tomas Bzatek
QA Contact:	gvfs-maint

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2011-03-11 15:09 UTC by Nikolay Martynov
Modified:	2011-04-15 10:20 UTC

See Also:
GNOME target:	3.2
GNOME version:	---

Attachments
Patch which fixes the problem (689 bytes, patch) 2011-03-11 15:09 UTC, Nikolay Martynov	none	Details \| Review
Problem description (6.24 KB, text/plain) 2011-03-11 15:14 UTC, Nikolay Martynov		Details
Patch with cancellable (5.82 KB, patch) 2011-03-28 16:30 UTC, Tomas Bzatek	none	Details \| Review
New patch with cancellable (5.25 KB, patch) 2011-04-05 09:40 UTC, Alexander Larsson	none	Details \| Review

Description Nikolay Martynov 2011-03-11 15:09:52 UTC

Created attachment 183150 [details] [review]
Patch which fixes the problem

When coping many small files from remote machine gvfsd-smb eventually dies with SIGSEGV:

+ Trace 226272

Thread 3049257840 (LWP 20101)

#0 g_vfs_channel_finalize
at gvfschannel.c line 125
#1 g_vfs_read_channel_finalize
at gvfsreadchannel.c line 70
#2 g_object_unref
at /build/buildd/glib2.0-2.26.1/gobject/gobject.c line 2712
#3 g_vfs_job_close_read_finalize
at gvfsjobcloseread.c line 48
#4 g_object_unref
at /build/buildd/glib2.0-2.26.1/gobject/gobject.c line 2712
#5 g_vfs_job_run
at gvfsjob.c line 200
#6 job_handler_callback
at gvfsdaemon.c line 144
#7 g_thread_pool_thread_proxy
at /build/buildd/glib2.0-2.26.1/glib/gthreadpool.c line 319
#8 g_thread_create_proxy
at /build/buildd/glib2.0-2.26.1/glib/gthread.c line 1897
#9 start_thread
at pthread_create.c line 304
#10 clone
at ../sysdeps/unix/sysv/linux/i386/clone.S line 130

Thread 3062855472 (LWP 20079)

#0 command_read_cb
at gvfschannel.c line 487
#1 async_ready_callback_wrapper
at /build/buildd/glib2.0-2.26.1/gio/ginputstream.c line 470
#2 g_simple_async_result_complete
at /build/buildd/glib2.0-2.26.1/gio/gsimpleasyncresult.c line 692
#3 read_async_cb
at /build/buildd/glib2.0-2.26.1/gio/gunixinputstream.c line 471
#4 fd_source_dispatch
at /build/buildd/glib2.0-2.26.1/gio/gasynchelper.c line 81
#5 g_main_dispatch
at /build/buildd/glib2.0-2.26.1/glib/gmain.c line 2149
#6 g_main_context_dispatch
at /build/buildd/glib2.0-2.26.1/glib/gmain.c line 2702
#7 g_main_context_iterate
at /build/buildd/glib2.0-2.26.1/glib/gmain.c line 2780
#8 g_main_loop_run
at /build/buildd/glib2.0-2.26.1/glib/gmain.c line 2988
#9 daemon_main
at daemon-main.c line 294
#10 main
at daemon-main-generic.c line 39


Breakpoint 5, command_read_cb (source_object=0x9a0d4a0 [GUnixInputStream], res=0x9994d58, user_data=0x9a555e0) at gvfschannel.c:498
498	      reader->channel->priv->request_reader = NULL;
$397 = (RequestReader *) 0x9a555e0
----------------------------------------

This shows that in one thread channel is being unrefed in 'g_vfs_job_close_read_finalize' while in another thread it is still being used by 'command_read_cb. And although breakpoint 2 happens before breakpoint 4 we still get to breakpoint 5 and NULL pointer dereference. I'd guess this is because two core system or the way gdb handles several threads.

I think the problem occurs because RequestReader holds reference to channel but never calls g_object_ref/g_object_unref on it.

I'll attach patch which fixes problem for me. It can be applied to 1.6.4 as well as current git.

Please let me know if you need any more information or if this patch has any problems.

Thanks.
Nikolay.

Comment 1 Nikolay Martynov 2011-03-11 15:14:31 UTC

Created attachment 183151 [details]
Problem description

It seems that bugzilla have eaten my two stack traces onto one. Attaching problem description with all the information I have as a text file.

Sorry about this.

Comment 2 Alexander Larsson 2011-03-17 18:50:43 UTC

It seems like we're getting an async data read (probably a readahead since the app didn't initiate it) and at the same time (slightly later) the app closes the channel which happens sync on another thread. So the close->finalize races with command_read_cb() wrt accessing the channel.

Comment 3 Nikolay Martynov 2011-03-17 19:06:08 UTC

(In reply to comment #2)
> It seems like we're getting an async data read (probably a readahead since the
> app didn't initiate it) and at the same time (slightly later) the app closes
> the channel which happens sync on another thread. So the close->finalize races
> with command_read_cb() wrt accessing the channel.

Yes, probably something like this happens.
I have been using patch I've posted since I had it posted. It solves problem for me (and I was able to successfully reproduce problem 100% when downloading photos from XP machine). I didn't notice any side effects too.

Comment 4 Alexander Larsson 2011-03-18 09:22:32 UTC

That patch causes leaks though, doesn't it. With the ref we never finalize the channel, and thus the reader->channel will not be NULLed and the reader async cycle will never end.

Comment 5 Nikolay Martynov 2011-03-18 15:44:34 UTC

To the best of my understanding  '+  g_object_unref (reader->channel);' in 'request_reader_free' is supposed to prevent memory leaks.
Am I wrong?

Thanks!

Comment 6 Tomas Bzatek 2011-03-18 16:27:37 UTC

(In reply to comment #5)
> To the best of my understanding  '+  g_object_unref (reader->channel);' in
> 'request_reader_free' is supposed to prevent memory leaks.
yeah, but the first test (reader->channel == NULL) is always false and we need to wait until we get zero data from the input stream (i.e. complete the read operation) to have request_reader_free() actually called.

So I was thinking about adding some kind of cancel call in g_vfs_job_close_read_finalize().

Comment 7 Nikolay Martynov 2011-03-19 04:03:52 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > To the best of my understanding  '+  g_object_unref (reader->channel);' in
> > 'request_reader_free' is supposed to prevent memory leaks.
> yeah, but the first test (reader->channel == NULL) is always false and we need
> to wait until we get zero data from the input stream (i.e. complete the read
> operation) to have request_reader_free() actually called.
  Correct me if I'm wrong, please. But if we got to 'command_read_cb' this means that there was some event on descriptor and 'g_input_stream_read_finish' will return fast with data or without. And if there is no data we release reference on channel.
  This also means that 'if (reader->channel == NULL)' clause is not required in 'data_read_cb' and 'command_read_cb'.

> 
> So I was thinking about adding some kind of cancel call in
> g_vfs_job_close_read_finalize().
  I do not really understand how this supposed to work but I'd really appreciate if you could tell me.

  Also, I'd really appreciate if you could describe scenario when my patch can cause a leak (or any other problem). To the best of my knowledge my patch can only delay destruction of channel for the time reader references it. But reader is being destroyed when connection is closed.

Comment 8 Alexander Larsson 2011-03-21 09:23:30 UTC

Hmm, yeah, i guess it would free the reader if it got an error reading from the command stream. 

This will only happen if we're guaranteed to get an error or EOF from reading the command stream though. Since we ref the channel we won't call g_vfs_channel_finalize(), so  we won't close the local side of the command channel socket. This means we rely on the remote end to close the socket to avoid the leak.

I do think that this is fine in the current code (the remote will normally close the socket), but it seems a bit risky. I think it would be safer if we ensured we had some local way to make sure we return out of the command stream read. 

One way would be to call g_object_run_dispose() in g_vfs_job_close_read_finalize() and cancel the reader in the dispose handler. That will guarantee we get an error on the command channel.

Comment 9 Alexander Larsson 2011-03-21 09:24:49 UTC

And, like you said, all the reader->channel == NULL stuff can go away with this approach.

Comment 10 Tomas Bzatek 2011-03-28 16:30:57 UTC

Created attachment 184480 [details] [review]
Patch with cancellable

This patch adds a cancellable so that we could easily finalize from a dispose event.

Perhaps holding a reference to cancellable in RequestReader is not necessary as long as we hold reference to the channel, but...

Comment 11 Alexander Larsson 2011-04-05 09:40:27 UTC

Created attachment 185180 [details] [review]
New patch with cancellable

Actually, we already have a place where we "close" the channel, in send_reply_cb() where we call g_vfs_job_source_closed. We can drop the whole dispose thing if we just cancel from there.

Also, we can totally remove the channel->request_reader field.

New, untested patch.

Comment 12 Tomas Bzatek 2011-04-05 09:44:51 UTC

Nikolay, could you please test the last patch? I'm not able to hit the original bug.

Comment 13 Nikolay Martynov 2011-04-05 20:06:18 UTC

I cannot reproduce original bug with this patch applied.

Comment 14 Alexander Larsson 2011-04-15 10:20:45 UTC

Pushed to master and gnome-3-0