Bug 134127 – Method invocation on non-existing object hangs

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 134127 - Method invocation on non-existing object hangs


Summary:	Method invocation on non-existing object hangs


Status:	RESOLVED WONTFIX

Product:	ORBit2
Classification:	Deprecated
Component:	general
Version:	2.8.x
Hardware:	Other Solaris

Importance:	Normal normal
Target Milestone:	---
Assigned To:	ORBit maintainers
QA Contact:	ORBit maintainers

URL:
Whiteboard:

Depends on:
Blocks:	330161

Reported:	2004-02-11 16:50 UTC by Artur Pawluc
Modified:	2007-07-16 21:31 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Log from hang-up, ORBIT2_DEBUG=all (2.96 KB, text/plain) 2004-02-13 20:09 UTC, Artur Pawluc	Details
IOR that makes hangup (238 bytes, text/plain) 2004-02-13 20:15 UTC, Artur Pawluc	Details
Log from succesfull execution, ORBIT2_DEBUG=all (3.25 KB, text/plain) 2004-02-13 20:17 UTC, Artur Pawluc	Details
IOR from succesfull execution (236 bytes, text/plain) 2004-02-13 20:17 UTC, Artur Pawluc	Details

Description Artur Pawluc 2004-02-11 16:50:25 UTC

We are using following components to implement client process.
-- glib v. 2.2.3
-- libIDL v. 0.7.4
-- ORBit2 v. 2.9.7
-- orbitcpp v. 1.3.7

We are connecting to the server (over the network), implemented in ORBit 
0.5.17
Server is publishing his object reference (for the client) in text file.
Server is listening on the defined port (i.e. 1025)
We have no influence on the server implementation.

We have found following, nasty, problem.
When client tries to connect to the server via invalid reference (e.g. 
after server reboot) the client hangs (forever).
We have tried to utilize non_existent() method however it returns OK 
(false) status (so we can't decect invalid object that way).
Would you please advice how to detect invalid object and avoid client 
hangup.

Comment 1 Michael Meeks 2004-02-13 10:28:33 UTC

Ok; this is really bad.
Firstly - assuming everything is going well - the client _has_ to wait
until the server has responded before it can return to your call, so
this may well be a server bug; however - we need to debug it.

Can you use:

ior-decode-2 (part of the ORBit2 package) on the IOR you have (
failing that do a CORBA_Object_to_string, printf, and then do the
dump. That should tell us exactly what's happening there.

Then - you need to re-configure ORBit2 with --enable-debug=yes, make
clean ; make install.

Then re-run your hanging code, but first do:

export ORBIT2_DEBUG=giop:traces

[ unless it hangs very early in which case use 'all' ].

This will churn out all the data flying to and fro to the remote
server, so we can analyse it - possibly there is some GIOP/version
problem leading to it waiting for more data than is coming.

HTH.

Comment 2 Artur Pawluc 2004-02-13 20:09:58 UTC

Created attachment 24390 [details]
Log from hang-up, ORBIT2_DEBUG=all

Comment 3 Artur Pawluc 2004-02-13 20:14:00 UTC

Thank you for advice.

According to your instructions we have collected:
1. Log from hang-up, attached dm_osa_cham_hang.log
2. IOR that makes hangup, attached ipChargingManagerIOR_hang.txt
3. Log from succesfull execution, attached dm_osa_cham_ok.log
4. IOR from succesfull execution, attached ipChargingManagerIOR_ok.txt

The flow of execution is as follow:
1. string_to_object() // create object from IOR
2. _narrow() // narrow to target object, i.e. IpChargingManager
3. _non_existent() // check for object existence
4. setCallback() // call interface method on IpChargingManager

The flow with "bad" IOR hangs on point  4.
The "bad" IOR comes from old server process.
Server is listening always on the same port, i.e. 1025.
The server is written in ORBit 0.5.17

Looking forward for your comments,
Artur

Comment 4 Artur Pawluc 2004-02-13 20:15:11 UTC

Created attachment 24391 [details]
IOR that makes hangup

Comment 5 Artur Pawluc 2004-02-13 20:17:04 UTC

Created attachment 24392 [details]
Log from succesfull execution, ORBIT2_DEBUG=all

Comment 6 Artur Pawluc 2004-02-13 20:17:45 UTC

Created attachment 24393 [details]
IOR from succesfull execution

Comment 7 Artur Pawluc 2004-02-23 14:42:04 UTC

Based on the provided information would you please advice if there is 
any solution or workaround for the problem?

Comment 8 Michael Meeks 2004-02-24 10:58:27 UTC

Well - the problem is - it's really not instantly clear what is going
on; without digging out all the GIOP marshalling stuff it _looks_ like
the ORB is sending a correct message to the remote server and it's
just blocking waiting for it to reply.

Ah - and ... I'm guessing that the remote end - which is ORBit-0.5.7
is getting an incoming invocation on what is (now) a dead/invalid
object reference - and it is simply dropping the request and not
responding at all.

This is pretty painful since the ORB can't know that's going to
happen. Really the remote end should close the connection in this
case: in which case it's a bug in ORBit-0.5.7.

Of course - if you can't affect that end you're going to have to take
a different approach. However - how did you get the IOR for the remote
end ? if you can detect the other end is taking a long time to respond
can you re-fetch it ?

It should be possible to use the asynchrous method invocation stuff to
invoke a method, and then time out if it's taking too long; see
ORBit_small_invoke_async [ a C method I'm afraid ] - you have to
(unfortunately) do some hand-coding mess for marshalling stuff - but
this shouldn't be too painful for an individual method you can use to
determine 'aliveness' at some stage (perhaps).

Some example usage is in ORBit2/test/everything/client.c
(test_BasicServer_opStringA) - although the test code does some
mangled stuff in general.

Comment 9 Cameron Kellough 2006-04-21 00:20:26 UTC

I believe that the route clipped below from ORBit2-2.12.4 ist he problem based on an evolution bug I was hunting for:

The system hangs at the g_cond_wait below probably because there is no data ever going to come in on the stream since the object was invalid.  I'm a multi-threaded C guy, not a CORBA guy so I'm not sure what the context of this thing is, all I can say is that this lock is where it hung in my bug.  I think CORBA stands for Completely Obfuscated Broker Request Architecture.

GIOPRecvBuffer *
giop_recv_buffer_get (GIOPMessageQueueEntry *ent)
{
	GIOPThread *tdata = giop_thread_self ();

 thread_switch:
	if (giop_thread_io ()) {
		ent_lock (ent);

		for (; !check_got (ent); ) {
			if (!giop_thread_queue_empty_T (tdata)) {
				ent_unlock (ent);
				giop_thread_queue_process (tdata);
				ent_lock (ent);
			} else
				g_cond_wait (tdata->incoming, tdata->lock);
		}
		
		ent_unlock (ent);

	} else { /* non-threaded */

		while (!ent->buffer && ent->cnx &&
		       (ent->cnx->parent.status != LINK_DISCONNECTED) &&
		       !giop_thread_io())
			link_main_iteration (TRUE);

		if (giop_thread_io())
			goto thread_switch;
	}

	giop_thread_queue_tail_wakeup (tdata);
	giop_recv_list_destroy_queue_entry (ent);

	return ent->buffer;
}

Comment 10 Michael Meeks 2006-04-24 08:51:32 UTC

Cameron - if the remote end does not reply - then the client cannot just give-up waiting; we block forever on that condition until we get an 'invalid foo' come back, and/or a connection close - at which point we continue.

This behavior is completely correct.

Now - this bug, is a *very* old bug relating to ORBit2 <-> ORBit 0.5.7 - where the latter is not responding correctly on a message it doesn't like. ie. we do:

Process A          Process B
ORBit2  -> doIt -> ORBit 0.5.7

and unfortunately due to a bug in B, A blocks forever waiting for a response: an exception return, or a connection close that never happens: B just silently drops the data.

Ultimately, since we're not maintaining / supporting 0.5.7 any more, I'm closing this wontfix.