Bug 526654 – Call does not reach established state

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 526654 - Call does not reach established state


Summary:	Call does not reach established state


Status:	RESOLVED FIXED

Product:	ekiga
Classification:	Applications
Component:	general
Version:	GIT master
Hardware:	Other Linux

Importance:	Normal blocker
Target Milestone:	---
Assigned To:	Ekiga maintainers
QA Contact:	Ekiga maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-04-07 06:24 UTC by Matthias Schneider
Modified:	2008-05-20 07:25 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Matthias Schneider 2008-04-07 06:24:28 UTC

Sometimes Ekiga does not seem to reach the established call state, meaning that the connection timer will not be runninng and hangup will be impossible.

A -d 4 was provided by Torsten at:
http://www.unixarea.de/Ekiga/ekigaLogD4.txt

In his case the problem appears with V4L, but not with the Moving Logo (H.261)

The video stream works in both directions however.

Comment 1 Matthias Schneider 2008-04-07 06:24:55 UTC

Some more output provided by Torsten:
$ egrep 'PCSS|connection established' d3log-for-MovingLogo
2008/04/04 14:31:45.557      2:41.862    GMURLHandler:0x2be77f00    PCSS    Outgoing call routed to sip:500@ekiga.net for Call[1]-EP<pc>[/dev/dsp0]
2008/04/04 14:31:46.346      2:42.650      Aggregator:0x29b03900    PCSS    Adding filters to patch
2008/04/04 14:31:46.355      2:42.659      Aggregator:0x29b03900    PCSS    Adding filters to patch
2008/04/04 14:31:46.401      2:42.705      Aggregator:0x29b03900    PCSS    SetConnected()
2008/04/04 14:31:46.401      2:42.705      Aggregator:0x29b03900    PCSS    SetConnected() mediaStreams.IsEmpty? --> phase=EstablishedPhase
2008/04/04 14:31:46.401      2:42.705      Aggregator:0x29b03900    PCSS    SetConnected() calling OnEstablished()
2008/04/04 14:31:46.419      2:42.723      Aggregator:0x29b03900    GMPCSSEndpoint     PCSS connection established
2008/04/04 14:31:46.420      2:42.724      Aggregator:0x29b03900    GMSIPEndpoint     SIP connection established


$ egrep 'PCSS|connection established' d3log-for-V4L
2008/04/04 14:32:27.649      0:21.351    GMURLHandler:0x29bd6300    PCSS    Outgoing call routed to sip:500@ekiga.net for Call[1]-EP<pc>[/dev/dsp0]
2008/04/04 14:32:28.162      0:21.865      Aggregator:0x2bc67b00    PCSS    Adding filters to patch
2008/04/04 14:32:28.170      0:21.872      Aggregator:0x2bc67b00    PCSS    Adding filters to patch
2008/04/04 14:32:29.503      0:23.205      Aggregator:0x2bc67b00    PCSS    SetConnected()
2008/04/04 14:32:29.503      0:23.205      Aggregator:0x2bc67b00    PCSS    SetConnected() mediaStreams.IsEmpty? --> phase=EstablishedPhase
2008/04/04 14:32:29.503      0:23.205      Aggregator:0x2bc67b00    PCSS    SetConnected() calling OnEstablished()

Comment 2 Matthias Schneider 2008-04-07 06:25:45 UTC

Some more:
Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial()
Debug:src/endpoints/opal-call.cpp Opal::Call::Call()
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Audio
Debug:stream_name: PCMU
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Audio
Debug:stream_name: PCMU
Debug:src/endpoints/opal-call.cpp Opal::Call::OnEstablished()
Debug:src/endpoints/opal-call.cpp Opal::Call::OnEstablished()
Debug:on_established_call_cb
Debug:setting mw->current_call = &call;

i.e. no video is involved (which explains as well that switching off V4L
to Moving Logo works well);

while for 500@ekiga.net it is:

Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial()
Debug:src/endpoints/opal-call.cpp Opal::Call::Call()
Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial()
Debug:src/endpoints/opal-call.cpp Opal::Call::Call()
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Audio
Debug:stream_name: PCMU
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Audio
Debug:stream_name: PCMU
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Video
Debug:stream_name: h261
Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream()
Debug:StreamType Video
Debug:stream_name: h261

and OnEstablished() is not passed...

Comment 3 Matthias Schneider 2008-04-07 06:29:14 UTC

I have also seen this problem from time to time, sometimes depending on the selected codec. Theora seemed to have worked most of the time. A guess is that we leave some mutex protected region too short for other threads to take the mutex when using a higher complexity image (real image vs. logo) or when using a a codec that needs more processing time.

This is a major blocker for 3.00beta.

Comment 4 Matthias Schneider 2008-04-07 06:30:37 UTC

Sorry, s/Torsten/Matthias Apitz|Emilio/

Comment 5 Snark 2008-04-07 06:54:41 UTC

I don't get what you mean in comment #3 : what do you mean by "we leave some mutex protected region too short for other threads to take the
mutex when ..." ?

Comment 6 Damien Sandras 2008-04-07 07:36:12 UTC

I suspect he is right. He means there is thread starvation (I think it is the correct term).

Comment 7 Snark 2008-04-07 07:52:03 UTC

You mean some mutex is kept too long, so another thread has to wait too much?

Comment 8 Damien Sandras 2008-04-07 08:02:35 UTC

No, you have :
- mutex.Wait()
- mutex.Signal()
in a loop.

If it goes very quickly, another thread doing mutex.Wait() will never have the opportunity to take the mutex. So only the main thread (with the loop) will take and release the mutex.

Comment 9 Matthias Schneider 2008-04-07 08:27:52 UTC

Yes, thats exactly what I meant. If you have 

while (x) {
  mutex.Wait()
  mutex.Signal()
  do_something_very_quickly();
}

A mutex.Wait() in another thread might never get a chance except if the OS scheduler interrupts the loop-thread during do_something_very_quickly(), with a probability that is lower when do_something_very_quickly()'s executiong time is less. I had that similar issue when trying to set the audio device volumes from another thread than the audio stream thread, which was mostly waiting in a mutex-protected region (I have changed that code).

Comment 10 Damien Sandras 2008-04-07 08:32:26 UTC

That would explain why it works for everybody and not for Emilio. He is running FreeBSD, which probably has a different behavior than the Linux kernel.

Comment 11 Matthias Schneider 2008-04-07 08:37:00 UTC

However I have also run into this problem sometimes... On linux 2.6.1x...

Comment 12 Matthias Schneider 2008-04-07 08:47:11 UTC

Maybe this has to do with the removed 5ms sleep in patch.cxx ? However that sleep lead to other problems, mainly with UDP input queue overflows at high bitrates/packet rates...

Comment 13 Snark 2008-04-07 08:49:24 UTC

If the problem is highly scheduler-bound, it's highly probable that some schedulers show the issue often and some almost never... and a same scheduler with a slight parameter change will go from the "works" to the "never works" category easily!

Comment 14 Snark 2008-04-07 08:55:18 UTC

Couldn't that be done in something like a glib mainloop, where other threads
could register to get their turn?

Comment 15 Matthias Schneider 2008-04-07 09:00:03 UTC

OPAL has no glib main loop...

Comment 16 Snark 2008-04-07 09:05:39 UTC

I said "like" ;-)

Comment 17 Matthias Schneider 2008-04-07 09:46:12 UTC

No, not really. We need a thread for video transmission and we need one for video recption.

transmission loop

- wait for get_frame_data to unblock
- encode and pass to display thread (preview)
- encapsulate
- send out udp packe

reception loop

- wait for incoming packets
- check if an entire frame was received (e.g via marker bit)
- deencapsulate
- decode
- pass to display thread

So I dont see anything that can be done in a "main loop".

However probably some actions requiring mutexes can be changed to operate mutex-less. However I have to go into this before commenting it more...

Comment 18 Matthias Schneider 2008-05-20 07:25:45 UTC

The issue does not seem to occur anymor. Possibly it was due to flooding with debug output from a high priority media thread.