GNOME Bugzilla – Bug 644505
gfvsd-smb dies when fetching many small files from remote [patch]
Last modified: 2011-04-15 10:20:45 UTC
Created attachment 183150 [details] [review] Patch which fixes the problem When coping many small files from remote machine gvfsd-smb eventually dies with SIGSEGV:
+ Trace 226272
Thread 3049257840 (LWP 20101)
Thread 3062855472 (LWP 20079)
Breakpoint 5, command_read_cb (source_object=0x9a0d4a0 [GUnixInputStream], res=0x9994d58, user_data=0x9a555e0) at gvfschannel.c:498 498 reader->channel->priv->request_reader = NULL; $397 = (RequestReader *) 0x9a555e0 ---------------------------------------- This shows that in one thread channel is being unrefed in 'g_vfs_job_close_read_finalize' while in another thread it is still being used by 'command_read_cb. And although breakpoint 2 happens before breakpoint 4 we still get to breakpoint 5 and NULL pointer dereference. I'd guess this is because two core system or the way gdb handles several threads. I think the problem occurs because RequestReader holds reference to channel but never calls g_object_ref/g_object_unref on it. I'll attach patch which fixes problem for me. It can be applied to 1.6.4 as well as current git. Please let me know if you need any more information or if this patch has any problems. Thanks. Nikolay.
Created attachment 183151 [details] Problem description It seems that bugzilla have eaten my two stack traces onto one. Attaching problem description with all the information I have as a text file. Sorry about this.
It seems like we're getting an async data read (probably a readahead since the app didn't initiate it) and at the same time (slightly later) the app closes the channel which happens sync on another thread. So the close->finalize races with command_read_cb() wrt accessing the channel.
(In reply to comment #2) > It seems like we're getting an async data read (probably a readahead since the > app didn't initiate it) and at the same time (slightly later) the app closes > the channel which happens sync on another thread. So the close->finalize races > with command_read_cb() wrt accessing the channel. Yes, probably something like this happens. I have been using patch I've posted since I had it posted. It solves problem for me (and I was able to successfully reproduce problem 100% when downloading photos from XP machine). I didn't notice any side effects too.
That patch causes leaks though, doesn't it. With the ref we never finalize the channel, and thus the reader->channel will not be NULLed and the reader async cycle will never end.
To the best of my understanding '+ g_object_unref (reader->channel);' in 'request_reader_free' is supposed to prevent memory leaks. Am I wrong? Thanks!
(In reply to comment #5) > To the best of my understanding '+ g_object_unref (reader->channel);' in > 'request_reader_free' is supposed to prevent memory leaks. yeah, but the first test (reader->channel == NULL) is always false and we need to wait until we get zero data from the input stream (i.e. complete the read operation) to have request_reader_free() actually called. So I was thinking about adding some kind of cancel call in g_vfs_job_close_read_finalize().
(In reply to comment #6) > (In reply to comment #5) > > To the best of my understanding '+ g_object_unref (reader->channel);' in > > 'request_reader_free' is supposed to prevent memory leaks. > yeah, but the first test (reader->channel == NULL) is always false and we need > to wait until we get zero data from the input stream (i.e. complete the read > operation) to have request_reader_free() actually called. Correct me if I'm wrong, please. But if we got to 'command_read_cb' this means that there was some event on descriptor and 'g_input_stream_read_finish' will return fast with data or without. And if there is no data we release reference on channel. This also means that 'if (reader->channel == NULL)' clause is not required in 'data_read_cb' and 'command_read_cb'. > > So I was thinking about adding some kind of cancel call in > g_vfs_job_close_read_finalize(). I do not really understand how this supposed to work but I'd really appreciate if you could tell me. Also, I'd really appreciate if you could describe scenario when my patch can cause a leak (or any other problem). To the best of my knowledge my patch can only delay destruction of channel for the time reader references it. But reader is being destroyed when connection is closed.
Hmm, yeah, i guess it would free the reader if it got an error reading from the command stream. This will only happen if we're guaranteed to get an error or EOF from reading the command stream though. Since we ref the channel we won't call g_vfs_channel_finalize(), so we won't close the local side of the command channel socket. This means we rely on the remote end to close the socket to avoid the leak. I do think that this is fine in the current code (the remote will normally close the socket), but it seems a bit risky. I think it would be safer if we ensured we had some local way to make sure we return out of the command stream read. One way would be to call g_object_run_dispose() in g_vfs_job_close_read_finalize() and cancel the reader in the dispose handler. That will guarantee we get an error on the command channel.
And, like you said, all the reader->channel == NULL stuff can go away with this approach.
Created attachment 184480 [details] [review] Patch with cancellable This patch adds a cancellable so that we could easily finalize from a dispose event. Perhaps holding a reference to cancellable in RequestReader is not necessary as long as we hold reference to the channel, but...
Created attachment 185180 [details] [review] New patch with cancellable Actually, we already have a place where we "close" the channel, in send_reply_cb() where we call g_vfs_job_source_closed. We can drop the whole dispose thing if we just cancel from there. Also, we can totally remove the channel->request_reader field. New, untested patch.
Nikolay, could you please test the last patch? I'm not able to hit the original bug.
I cannot reproduce original bug with this patch applied.
Pushed to master and gnome-3-0