GNOME Bugzilla – Bug 526654
Call does not reach established state
Last modified: 2008-05-20 07:25:45 UTC
Sometimes Ekiga does not seem to reach the established call state, meaning that the connection timer will not be runninng and hangup will be impossible. A -d 4 was provided by Torsten at: http://www.unixarea.de/Ekiga/ekigaLogD4.txt In his case the problem appears with V4L, but not with the Moving Logo (H.261) The video stream works in both directions however.
Some more output provided by Torsten: $ egrep 'PCSS|connection established' d3log-for-MovingLogo 2008/04/04 14:31:45.557 2:41.862 GMURLHandler:0x2be77f00 PCSS Outgoing call routed to sip:500@ekiga.net for Call[1]-EP<pc>[/dev/dsp0] 2008/04/04 14:31:46.346 2:42.650 Aggregator:0x29b03900 PCSS Adding filters to patch 2008/04/04 14:31:46.355 2:42.659 Aggregator:0x29b03900 PCSS Adding filters to patch 2008/04/04 14:31:46.401 2:42.705 Aggregator:0x29b03900 PCSS SetConnected() 2008/04/04 14:31:46.401 2:42.705 Aggregator:0x29b03900 PCSS SetConnected() mediaStreams.IsEmpty? --> phase=EstablishedPhase 2008/04/04 14:31:46.401 2:42.705 Aggregator:0x29b03900 PCSS SetConnected() calling OnEstablished() 2008/04/04 14:31:46.419 2:42.723 Aggregator:0x29b03900 GMPCSSEndpoint PCSS connection established 2008/04/04 14:31:46.420 2:42.724 Aggregator:0x29b03900 GMSIPEndpoint SIP connection established $ egrep 'PCSS|connection established' d3log-for-V4L 2008/04/04 14:32:27.649 0:21.351 GMURLHandler:0x29bd6300 PCSS Outgoing call routed to sip:500@ekiga.net for Call[1]-EP<pc>[/dev/dsp0] 2008/04/04 14:32:28.162 0:21.865 Aggregator:0x2bc67b00 PCSS Adding filters to patch 2008/04/04 14:32:28.170 0:21.872 Aggregator:0x2bc67b00 PCSS Adding filters to patch 2008/04/04 14:32:29.503 0:23.205 Aggregator:0x2bc67b00 PCSS SetConnected() 2008/04/04 14:32:29.503 0:23.205 Aggregator:0x2bc67b00 PCSS SetConnected() mediaStreams.IsEmpty? --> phase=EstablishedPhase 2008/04/04 14:32:29.503 0:23.205 Aggregator:0x2bc67b00 PCSS SetConnected() calling OnEstablished()
Some more: Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial() Debug:src/endpoints/opal-call.cpp Opal::Call::Call() Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Audio Debug:stream_name: PCMU Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Audio Debug:stream_name: PCMU Debug:src/endpoints/opal-call.cpp Opal::Call::OnEstablished() Debug:src/endpoints/opal-call.cpp Opal::Call::OnEstablished() Debug:on_established_call_cb Debug:setting mw->current_call = &call; i.e. no video is involved (which explains as well that switching off V4L to Moving Logo works well); while for 500@ekiga.net it is: Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial() Debug:src/endpoints/opal-call.cpp Opal::Call::Call() Debug:lib/engine/protocol/skel/call-core.cpp CallCore::dial() Debug:src/endpoints/opal-call.cpp Opal::Call::Call() Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Audio Debug:stream_name: PCMU Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Audio Debug:stream_name: PCMU Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Video Debug:stream_name: h261 Debug:src/endpoints/opal-call.cpp Opal::Call::OnOpenMediaStream() Debug:StreamType Video Debug:stream_name: h261 and OnEstablished() is not passed...
I have also seen this problem from time to time, sometimes depending on the selected codec. Theora seemed to have worked most of the time. A guess is that we leave some mutex protected region too short for other threads to take the mutex when using a higher complexity image (real image vs. logo) or when using a a codec that needs more processing time. This is a major blocker for 3.00beta.
Sorry, s/Torsten/Matthias Apitz|Emilio/
I don't get what you mean in comment #3 : what do you mean by "we leave some mutex protected region too short for other threads to take the mutex when ..." ?
I suspect he is right. He means there is thread starvation (I think it is the correct term).
You mean some mutex is kept too long, so another thread has to wait too much?
No, you have : - mutex.Wait() - mutex.Signal() in a loop. If it goes very quickly, another thread doing mutex.Wait() will never have the opportunity to take the mutex. So only the main thread (with the loop) will take and release the mutex.
Yes, thats exactly what I meant. If you have while (x) { mutex.Wait() mutex.Signal() do_something_very_quickly(); } A mutex.Wait() in another thread might never get a chance except if the OS scheduler interrupts the loop-thread during do_something_very_quickly(), with a probability that is lower when do_something_very_quickly()'s executiong time is less. I had that similar issue when trying to set the audio device volumes from another thread than the audio stream thread, which was mostly waiting in a mutex-protected region (I have changed that code).
That would explain why it works for everybody and not for Emilio. He is running FreeBSD, which probably has a different behavior than the Linux kernel.
However I have also run into this problem sometimes... On linux 2.6.1x...
Maybe this has to do with the removed 5ms sleep in patch.cxx ? However that sleep lead to other problems, mainly with UDP input queue overflows at high bitrates/packet rates...
If the problem is highly scheduler-bound, it's highly probable that some schedulers show the issue often and some almost never... and a same scheduler with a slight parameter change will go from the "works" to the "never works" category easily!
Couldn't that be done in something like a glib mainloop, where other threads could register to get their turn?
OPAL has no glib main loop...
I said "like" ;-)
No, not really. We need a thread for video transmission and we need one for video recption. transmission loop - wait for get_frame_data to unblock - encode and pass to display thread (preview) - encapsulate - send out udp packe reception loop - wait for incoming packets - check if an entire frame was received (e.g via marker bit) - deencapsulate - decode - pass to display thread So I dont see anything that can be done in a "main loop". However probably some actions requiring mutexes can be changed to operate mutex-less. However I have to go into this before commenting it more...
The issue does not seem to occur anymor. Possibly it was due to flooding with debug output from a high priority media thread.