GNOME Bugzilla – Bug 134127
Method invocation on non-existing object hangs
Last modified: 2007-07-16 21:31:02 UTC
We are using following components to implement client process. -- glib v. 2.2.3 -- libIDL v. 0.7.4 -- ORBit2 v. 2.9.7 -- orbitcpp v. 1.3.7 We are connecting to the server (over the network), implemented in ORBit 0.5.17 Server is publishing his object reference (for the client) in text file. Server is listening on the defined port (i.e. 1025) We have no influence on the server implementation. We have found following, nasty, problem. When client tries to connect to the server via invalid reference (e.g. after server reboot) the client hangs (forever). We have tried to utilize non_existent() method however it returns OK (false) status (so we can't decect invalid object that way). Would you please advice how to detect invalid object and avoid client hangup.
Ok; this is really bad. Firstly - assuming everything is going well - the client _has_ to wait until the server has responded before it can return to your call, so this may well be a server bug; however - we need to debug it. Can you use: ior-decode-2 (part of the ORBit2 package) on the IOR you have ( failing that do a CORBA_Object_to_string, printf, and then do the dump. That should tell us exactly what's happening there. Then - you need to re-configure ORBit2 with --enable-debug=yes, make clean ; make install. Then re-run your hanging code, but first do: export ORBIT2_DEBUG=giop:traces [ unless it hangs very early in which case use 'all' ]. This will churn out all the data flying to and fro to the remote server, so we can analyse it - possibly there is some GIOP/version problem leading to it waiting for more data than is coming. HTH.
Created attachment 24390 [details] Log from hang-up, ORBIT2_DEBUG=all
Thank you for advice. According to your instructions we have collected: 1. Log from hang-up, attached dm_osa_cham_hang.log 2. IOR that makes hangup, attached ipChargingManagerIOR_hang.txt 3. Log from succesfull execution, attached dm_osa_cham_ok.log 4. IOR from succesfull execution, attached ipChargingManagerIOR_ok.txt The flow of execution is as follow: 1. string_to_object() // create object from IOR 2. _narrow() // narrow to target object, i.e. IpChargingManager 3. _non_existent() // check for object existence 4. setCallback() // call interface method on IpChargingManager The flow with "bad" IOR hangs on point 4. The "bad" IOR comes from old server process. Server is listening always on the same port, i.e. 1025. The server is written in ORBit 0.5.17 Looking forward for your comments, Artur
Created attachment 24391 [details] IOR that makes hangup
Created attachment 24392 [details] Log from succesfull execution, ORBIT2_DEBUG=all
Created attachment 24393 [details] IOR from succesfull execution
Based on the provided information would you please advice if there is any solution or workaround for the problem?
Well - the problem is - it's really not instantly clear what is going on; without digging out all the GIOP marshalling stuff it _looks_ like the ORB is sending a correct message to the remote server and it's just blocking waiting for it to reply. Ah - and ... I'm guessing that the remote end - which is ORBit-0.5.7 is getting an incoming invocation on what is (now) a dead/invalid object reference - and it is simply dropping the request and not responding at all. This is pretty painful since the ORB can't know that's going to happen. Really the remote end should close the connection in this case: in which case it's a bug in ORBit-0.5.7. Of course - if you can't affect that end you're going to have to take a different approach. However - how did you get the IOR for the remote end ? if you can detect the other end is taking a long time to respond can you re-fetch it ? It should be possible to use the asynchrous method invocation stuff to invoke a method, and then time out if it's taking too long; see ORBit_small_invoke_async [ a C method I'm afraid ] - you have to (unfortunately) do some hand-coding mess for marshalling stuff - but this shouldn't be too painful for an individual method you can use to determine 'aliveness' at some stage (perhaps). Some example usage is in ORBit2/test/everything/client.c (test_BasicServer_opStringA) - although the test code does some mangled stuff in general.
I believe that the route clipped below from ORBit2-2.12.4 ist he problem based on an evolution bug I was hunting for: The system hangs at the g_cond_wait below probably because there is no data ever going to come in on the stream since the object was invalid. I'm a multi-threaded C guy, not a CORBA guy so I'm not sure what the context of this thing is, all I can say is that this lock is where it hung in my bug. I think CORBA stands for Completely Obfuscated Broker Request Architecture. GIOPRecvBuffer * giop_recv_buffer_get (GIOPMessageQueueEntry *ent) { GIOPThread *tdata = giop_thread_self (); thread_switch: if (giop_thread_io ()) { ent_lock (ent); for (; !check_got (ent); ) { if (!giop_thread_queue_empty_T (tdata)) { ent_unlock (ent); giop_thread_queue_process (tdata); ent_lock (ent); } else g_cond_wait (tdata->incoming, tdata->lock); } ent_unlock (ent); } else { /* non-threaded */ while (!ent->buffer && ent->cnx && (ent->cnx->parent.status != LINK_DISCONNECTED) && !giop_thread_io()) link_main_iteration (TRUE); if (giop_thread_io()) goto thread_switch; } giop_thread_queue_tail_wakeup (tdata); giop_recv_list_destroy_queue_entry (ent); return ent->buffer; }
Cameron - if the remote end does not reply - then the client cannot just give-up waiting; we block forever on that condition until we get an 'invalid foo' come back, and/or a connection close - at which point we continue. This behavior is completely correct. Now - this bug, is a *very* old bug relating to ORBit2 <-> ORBit 0.5.7 - where the latter is not responding correctly on a message it doesn't like. ie. we do: Process A Process B ORBit2 -> doIt -> ORBit 0.5.7 and unfortunately due to a bug in B, A blocks forever waiting for a response: an exception return, or a connection close that never happens: B just silently drops the data. Ultimately, since we're not maintaining / supporting 0.5.7 any more, I'm closing this wontfix.