After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 710691 - glib-networking: locking during implicit handshake
glib-networking: locking during implicit handshake
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: network
2.38.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-10-23 02:05 UTC by Aleix Conchillo Flaqué
Modified: 2013-11-11 15:37 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
keep dispatching while in implicit handshaking (2.27 KB, patch)
2013-10-30 20:57 UTC, Aleix Conchillo Flaqué
needs-work Details | Review
do not cancel waiting_for_op at creation time (5.88 KB, patch)
2013-10-31 07:23 UTC, Aleix Conchillo Flaqué
reviewed Details | Review

Description Aleix Conchillo Flaqué 2013-10-23 02:05:55 UTC
I'm trying the new SRTP support in GStreamer 1.2. I am using gst-rtsp-server as the component (examples/test-video.c with WITH_AUTH and WITH_TLS defined).

I first tried with the packages that come with Debian unstable but resulted in a crash (as of today it comes with glib 2.36.4 and glib-networking 2.36.1).

I then updated glib to 2.38.1 and glib-networking to git branch 2.38.

It didn't crash but it locks. I looked at it a little bit and the problem seems to be in tls/gnutls/gtlsconnection-gnutls.c (claim_op function).

My understanding is that it performs non-blocking read operations while still handshaking and the following condition seems not happy, as the op is (op != G_TLS_CONNECTION_GNUTLS_OP_HANDSHAKE) because we are in OP_READ and gnutls->priv->handshaking is true because we are still handshaking.

  if ((op != G_TLS_CONNECTION_GNUTLS_OP_WRITE && gnutls->priv->reading) ||
      (op != G_TLS_CONNECTION_GNUTLS_OP_READ && gnutls->priv->writing) ||
      (op != G_TLS_CONNECTION_GNUTLS_OP_HANDSHAKE && gnutls->priv->handshaking))

Then, as the read is happening in non-blocking mode we return an error.

      if (!blocking)
       {
         g_set_error_literal (error, G_IO_ERROR, G_IO_ERROR_WOULD_BLOCK,
                              _("Operation would block"));
         return FALSE;
       }

The following patch solves the problem, but I still don't have enough background to know if it's the right way to solve the problem. Basically, I am more permissive if handshaking.

-----------------

diff --git a/tls/gnutls/gtlsconnection-gnutls.c b/tls/gnutls/gtlsconnection-gnutls.c
index 35bcaad..ab24ae2 100644
--- a/tls/gnutls/gtlsconnection-gnutls.c
+++ b/tls/gnutls/gtlsconnection-gnutls.c
@@ -587,12 +587,12 @@ claim_op (GTlsConnectionGnutls    *gnutls,
 
       g_mutex_unlock (&gnutls->priv->op_mutex);
 
-      if (!blocking)
-       {
-         g_set_error_literal (error, G_IO_ERROR, G_IO_ERROR_WOULD_BLOCK,
-                              _("Operation would block"));
-         return FALSE;
-       }
+      if (!blocking && !gnutls->priv->handshaking)
+        {
+          g_set_error_literal (error, G_IO_ERROR, G_IO_ERROR_WOULD_BLOCK,
+                               _("Operation would block"));
+          return FALSE;
+        }
 
       g_cancellable_make_pollfd (gnutls->priv->waiting_for_op, &fds[0]);
       if (g_cancellable_make_pollfd (cancellable, &fds[1]))
Comment 1 Dan Winship 2013-10-24 02:04:17 UTC
(In reply to comment #0)
> The following patch solves the problem, but I still don't have enough
> background to know if it's the right way to solve the problem.

It's not. When a handshake is occurring, no one else can be reading or writing, since the handshake operation needs to be able to do both. (And particularly in the case of the first handshake, you can't read or write until the handshake is complete anyway.)

If it fixes your problem, then either (a) the hanging is occurring after the I/O part of the handshake finishes, but before glib-networking's internal handshaking bookkeeping is finished, or (b) the handshake really is completely finished but the bookkeeping is getting messed up somewhere.

Can you print out gnutls->priv when the connection is hung?

Also, is the code using sync or async I/O? Multiple threads? Explicit g_tls_connection_handshake() calls or implicit auto-handshaking?
Comment 2 Aleix Conchillo Flaqué 2013-10-25 00:08:28 UTC
The gst-rtsp-server (which internally uses a gstrtspconnection) creates a GSource with a pollable input stream.

The first time the gsource callback is called, g_pollable_input_stream_read_nonblocking is used. This returns G_IO_ERROR_WOULD_BLOCK because we are still handshaking.

After that, the gsource callback is not called anymore.

I don't think this is a glib-networking issue, it seems to do the right thing.

Marking as INVALID.
Comment 3 Aleix Conchillo Flaqué 2013-10-25 01:14:07 UTC
See bug 710850
Comment 4 Aleix Conchillo Flaqué 2013-10-30 14:44:21 UTC
The solution for bug 710850 is not the way this locking should be fixed. So, I am reopening this one until I figure out what's wrong.

Also, changing the description. This happens in asynchronous implicit handshaking.
Comment 5 Dan Winship 2013-10-30 20:46:20 UTC
ok, as per comment 1, seeing the contents of gnutls->priv when it's hung would be useful
Comment 6 Aleix Conchillo Flaqué 2013-10-30 20:57:50 UTC
Created attachment 258613 [details] [review]
keep dispatching while in implicit handshaking

I think I'm getting closer to something. This seems to work, but not 100% sure if it makes sense, I'm trying to figure out how g_source and streams work internally. First time with this.

I will get the gnutls->priv as you suggested.

From the commit log:

    * tls/gnutls/gtlsconnection-gnutls.c (gnutls_source_prepare): we need to
      tell the source that while implicit handshaking (actually asynchronous
      implicit handshaking) the stream is always ready, otherwise if we
      perform a read while in handshaking and it has returned WOULD_BLOCK
      the source dispatch would not be called again if we always return
      FALSE in this function.
Comment 7 Dan Winship 2013-10-30 21:48:09 UTC
Comment on attachment 258613 [details] [review]
keep dispatching while in implicit handshaking

This would make it spin and suck up CPU until the handshake thread completed...

How GTlsConnectionGnutlsSource is supposed to work is that gnutls_source_sync() figures out what condition we're waiting for, and adds an appropriate child source to itself that will trigger when that condition occurs. (Because of the way that child sources work, when the child source becomes ready, the GnutlsSource will get dispatched as well, even though gnutls_source_check() always returns FALSE.)

So in your case, gnutls_source_sync() should notice that we're waiting for a handshake to complete, and then create a cancellable source on priv->waiting_for_op. And then, when the handshake completes, the handshake thread will call yield_op(), which will trigger priv->waiting_for_op, which should cause your GnutlsSource to become ready and get dispatched.

So the first thing to check would be that gnutls_source_sync() is actually handling this case correctly, and setting up the right child source. And then if it is, figure out why either the source isn't dispatching, or else it is dispatching, but not having the expected effect for whatever reason.
Comment 8 Aleix Conchillo Flaqué 2013-10-30 22:10:51 UTC
(In reply to comment #7)
> (From update of attachment 258613 [details] [review])
> This would make it spin and suck up CPU until the handshake thread completed...
> 

Yep, that's what it does :-).

> How GTlsConnectionGnutlsSource is supposed to work is that gnutls_source_sync()
> figures out what condition we're waiting for, and adds an appropriate child
> source to itself that will trigger when that condition occurs. (Because of the
> way that child sources work, when the child source becomes ready, the
> GnutlsSource will get dispatched as well, even though gnutls_source_check()
> always returns FALSE.)
> 
> So in your case, gnutls_source_sync() should notice that we're waiting for a
> handshake to complete, and then create a cancellable source on
> priv->waiting_for_op. And then, when the handshake completes, the handshake
> thread will call yield_op(), which will trigger priv->waiting_for_op, which
> should cause your GnutlsSource to become ready and get dispatched.
> 
> So the first thing to check would be that gnutls_source_sync() is actually
> handling this case correctly, and setting up the right child source. And then
> if it is, figure out why either the source isn't dispatching, or else it is
> dispatching, but not having the expected effect for whatever reason.

Great, thanks for the explanation. I think I needed it.

Right now, I'm trying to add a test to cause the problem.
Comment 9 Aleix Conchillo Flaqué 2013-10-30 22:11:45 UTC
Comment on attachment 258613 [details] [review]
keep dispatching while in implicit handshaking

Marking as obsolete. See comment 7.
Comment 10 Aleix Conchillo Flaqué 2013-10-31 07:23:07 UTC
Created attachment 258633 [details] [review]
do not cancel waiting_for_op at creation time

Let's see now. I have also added a test case that blocks without the fix, so you can try it.

The problem was that waiting_for_op is canceled at creation time which causes an unwanted dispatch.

I think these are the events:

- gnutls_source_sync:

    - create child source on pollable input stream

- dispatch (something in the fd happens):

    - non-blocking read:
        -> handshaking = TRUE
        -> calls do_implicit_handshake
        -> returns WOULD_BLOCK

    - gnutls_source_sync:
         -> op_waiting = TRUE (because we are handshaking)
         -> create child source on cancellable waiting_for_op

Here we should already stop until yield_op is called for OP_HANDSHAKE so waiting_for_op is triggered. However, because waiting_for_op is initially canceled, a dispatch occurs.

There's still one thing that I don't understand. I think the original code should still work, as there is the g_cancellable_reset (gnutls->priv->waiting_for_op) when claim_op fails. And then, the yield_op for OP_HANDSHAKE happens and cancels waiting_for_op. But the dispatch is not triggered. Any ideas why this could be?
Comment 11 Aleix Conchillo Flaqué 2013-10-31 14:57:20 UTC
(In reply to comment #10)

> - gnutls_source_sync:
> 
>     - create child source on pollable input stream
> 

First one is a timeout source, not a pollable input stream source.
Comment 12 Aleix Conchillo Flaqué 2013-10-31 18:32:02 UTC
(In reply to comment #10)
> 
> There's still one thing that I don't understand. I think the original code
> should still work, as there is the g_cancellable_reset
> (gnutls->priv->waiting_for_op) when claim_op fails. And then, the yield_op for
> OP_HANDSHAKE happens and cancels waiting_for_op. But the dispatch is not
> triggered. Any ideas why this could be?

The problem is the cancellable source. According to g_cancellable_connect (which is used in g_cancellable_source_new):

 * @callback is called at most once, either directly at the
 * time of the connect if @cancellable is already cancelled,
 * or when @cancellable is cancelled in some thread.

Initially the cancellable was cancelled, that's why it was called. However, in this case, g_cancellable_connect does not connect to the cancelled signal.

So, when yield_op calls g_cancellable_cancel, the cancelled signal is emitted but there is now handler for it.
Comment 13 Aleix Conchillo Flaqué 2013-11-01 19:11:17 UTC
The fix works, but I was wondering why everything worked in other test cases before the fix, like test_basic.

test_basic is very similar to the new test_async_implict_handshake except that in the latest, a pollable input stream source is created first.

So, I printed what was going on in gnutls_source_sync for both tests with some annotations (see at the end).

This is the difference:

***** /tls/connection/basic *****

Here we use g_data_input_stream_read_line_async which internally will end up in 
read_async_pollable (in ginputstream.c). This functions performs nonblocking reads and if the result is WOULD_BLOCK it creates a new source.

***** /tls/connection/async_implicit_handshake *****

Here we create a pollable input stream source ourselves and we call g_pollable_input_stream_read_nonblocking. If we get a WOULD_BLOCK we do not create a new source.

So, in this case we expect waiting_for_op to work as expected.

--------------------------

***** /tls/connection/basic *****

-----
### this one is created because async read (async handshake) starts
read: gnutls_source_sync
      - op waiting 1 (old -1) io waiting 0 (old -1)
      - create waiting for op source
-----
### this one is created because async write (async handshake) starts
write: gnutls_source_sync
       - op waiting 1 (old -1) io waiting 0 (old -1)
       - create waiting for op source
-----
### this one is created because read_async_pollable got WOULD_BLOCK
read: gnutls_source_sync
      - op waiting 1 (old -1) io waiting 0 (old -1)
      - create waiting for op source
-----
write: gnutls_source_sync
       - op waiting 1 (old -1) io waiting 0 (old -1)
       - create waiting for op source
OK

***** /tls/connection/async-implicit-handshake *****

-----
### this one is the pollable input stream source, no reading is happening.
read: gnutls_source_sync    
      - op waiting 0 (old -1) io waiting 0 (old -1)
      - create time out source
-----
### this one is the asynchronous write in the test
write: gnutls_source_sync
       - op waiting 1 (old -1) io waiting 0 (old -1)
       - create waiting for op source
-----
### this one is because the timeout child source
read: gnutls_source_sync
      - op waiting 1 (old 0) io waiting 0 (old 0)
      - create waiting for op source
-----
### this one is because waiting_for_op is cancelled at creation time.
### NO MORE CHILD SOURCES CREATED!
read: gnutls_source_sync
      - op waiting 1 (old 1) io waiting 0 (old 0)
      - returning, no child source will be created
-----
write: gnutls_source_sync
       - op waiting 1 (old -1) io waiting 0 (old -1)
       - create waiting for op source

hangs...
Comment 14 Dan Winship 2013-11-11 14:31:15 UTC
Comment on attachment 258633 [details] [review]
do not cancel waiting_for_op at creation time

I've committed a patch to glib to fix this; the behavior of GCancellableSource accidentally changed in glib 2.38.

I'm committing the tls/tests/connection.c part of this patch though. Thanks for tracking this down.
Comment 15 Aleix Conchillo Flaqué 2013-11-11 15:37:24 UTC
Great. Thanks!

This is the patch for future references:

https://git.gnome.org/browse/glib/commit/?h=glib-2-38&id=bbf7b493918dde279e1c2db53bd5f2375fafa43e