GNOME Bugzilla – Bug 139486
Full introspection information
Last modified: 2015-02-07 16:59:42 UTC
Right now, introspection through libgobject is limited to those aspects of the object that go through GObject ... signals, properties, inheritance. However, for language bindings, it would be very useful to have information about, for instance: - Methods and other functions in the library - Virtual function slots in class structures The right way to do this is to attach a binary blob to the library - a "type library". Generally, you want a single blob rather than a set of C structures to avoid having relocations that have to be processed when loading the library. It may be interesting to expose introspection as a set of GInterfaces for navigating and invoking rather than directly exposing the type library - this would have the advantage of letting objects implemented in something other than C avoid having to create the type library and create C function stubs to call into the library. However, it has potential performance downsides when trying to create a really small/fast invocation core for a particular language. This was brought up on the language-bindings@gnome.org mailing list thread from: http://mail.gnome.org/archives/language-bindings/2004-March/msg00089.html
For comparison, here are some notes on how Microsoft do this with COM/OLE: * Type libraries can either be separate .tlb files, or as a binary resource in EXEs and DLLs. ELF has no explicit resource mechanism, however the equivalent can be achieved with a magic symbol name pointing to a binary blob, or alternatively a large set of statically initialized structures (generated by a type library compiler). * Typelibs are generated by a dedicated compiler which parses IDL, but that could be avoided here by using certain marker comments, coding styles etc. There are free parsers for C available, and if the format was documented GObjects implemented in languages other than C could be exposed via the typelib mechanism too. If the magic symbol was a function rather than a variable, typelibs could be generated on the fly. * In COM a type library is represented by the ITypeLib and ITypeLib2 interfaces which allow traversal and reflection of the types inside the library. ITypeInfo allows you to examine a particular type. The actual file format is an internal interface, the only public way to work with typelibs is via these interfaces. * A given type library can be used to generate IDispatch implementations which is basically "given a set of strings and variants representing method calls, invoke this method". Could be short circuited by having a g_object_get_method() method, I guess. * Typelibs can also contain documentation. Major misfeature there IMHO. It's worth noting that while the ITypeInfo interfaces allow you to invoke methods/properties, in practice this interface is never used (except internally by COM itself). In the case of C++ objects being exposed via COM to scripting engines (the most common case), typelibs are generated from IDL files however they are not exposed directly, but rather the system uses them to generate IDispatch implementations for each object on the fly which the scripting runtime uses directly. In the case of objects implemented in scripting languages being called from native code, I think either you have to use IDispatch directly OR the system generates proxy implementations of the required interfaces on the fly using DCOM (CORBA/Bonobo equivalent) and so the method calls are first translated into RPC packets and then demarshalled into IDispatch invocations behind the scenes. To be frank, I would vote +1 for both providing a stable typelib format that can be embedded into DSOs and also providing GInterfaces for navigating (but not invoking upon) that data. Scripting engines can then generate their own glue on the fly from this data. For the case of objects say written in Python which want to be accessed from C, on the fly construction of marshal thunks which do the correct argument conversion is probably the best approach: the C programmer can either get the method function pointers using a dlsym type interface, or for extra style you can use a similar trick to relaytool with dummy symbols and GOT rewriting to provide totally natural code. By that I mean you generate a file that contains code like this: asm(".globl my_python_object_say_hello .type my_python_object_say_hello, @function ... blah blah blah stub to make linker happy .... "); and then in the code which generates the marshal thunks have: asm("movl %%0, my_python_object_say_hello@GOT(%ebx)\n" : : (thunk_ptr)); That way you can link in a minimalist .o file which lets you write code which *looks* like you are accessing a standard C GObject but the function calls are redirected at runtime to the Python implementation of that object. All that's totally bluesky optional stuff though, but one thing working with COM has taught me is that it's worth having the core minimalist and then allowing people to write whatever magic glue they want on top to exploit the properties of their platform (ELF linker tricks for C/C++, metaclasses in Python, whatever) rather than try and make the core too generic and end up with a jack of all trades and master of none. I hope this provided food for thought. thanks -mike
are there major objections to using (an) IDL and compiler? it seems necessary, if you want a blob you don't have reloc. and of course C signatures don't have enough information for specifying interfaces. i favor it over comments as being cleaner. it could be made more palettable by having the IDL compiler spit out the boilerplate for the C implementation. i think most people would see this as a win. are there preferences among implementations? the choices i've seen are XPCOM, COM, whatever ORBit uses for its typelib format (but of course not using visible structs). i don't know how much these differ; can anyone else compare/constrast? finally, are there any suggestions on interface repositories? (which i guess would be the way one would get to all this information).
In my opinion: * IDL compilers are out. They don't fit into the way we do things and would require massive code changes to GObject-based libraries. We already extract all this information by parsing header files for gtk-doc, as a proof of concept. * No existing typelib is going to work without modification. Perhaps the XPCOMormat could be started from, but it needs to be extended to cover GLib/GObject types and concepts. * An out-of-process server (like the CORBA IR) doesn't make sense. Just a simple in-process API. Someone suggested it should be an interface so that (e.g.) the python bindings could avoid generating a typelib on the fly. Might be interesting though *fast* access is also important in some applications.
There is information that would be nice to have in a type library that aren't present in the headers, but would be in the IDL (default values for function arguments comes to mind. Relationships between objects and the functions that act as its methods are another). If some form of IDL is out of the question, what would be the best way to represent this extra data? As for typelibs, it'd definitely be good to have something that can be shared fully between processes. The typelibs ORBit2 generates are shared libraries, but are full of pointers which I'd imagine leads to a lot of relocations. In this respect, the XPCOM format is a lot better. For reference, XPCOM's format is documented at: http://www.mozilla.org/scriptable/typelib_file.html For interoperation with dynamic scripting languages, it would be nice if an IDispatch-style interface could be implemented generically using the typelib information and libffi. That might provide a nice way to do this without requiring a heaps of generated stub code to be compiled into the library.
1. Mike Hearn suggested adding a gobject call which would get a list of all the _get_type() calls in a library. he notes that the functions would be relocated on demand rather than at dynamic link-time. the gobject class can then carry all its meta data in the same fashion that it now carries the signal and property information. does anyone see a downside to this ? (e.g. would it cause a large and unnecessary burden for programs like GUI builders that might want to look at every class in a library but not necessarily instantiate them?) 2. are there any ideas how inheritance and interfaces would work? assume we find a class in libfoo which is derived from a widget in libbar. who carries what metadata? and how does the get_type function know how to find its parent?
On the question of IDL, my suggestion is to take the approach we've been discussing on the D-BUS mailing list. First have an XML "IDL" format that is a representation of the typelib data; write a converter from the XML format to the typelib itself; then write a separate "scan header files and generate the XML" piece of code. If you have a parse tree data structure (like dbus-gidl.h) you can convert that data structure to/from XML and to/from the typelib. So that lets you easily support: - scan header files to parse tree and generate XML - scan header files to parse tree and generate typelib - load XML to parse tree and generate typelib The advantage of having the XML intermediate representation rather than just the in-process parse tree data structure is 1) you can get everything going without writing the "hard part" (the heuristic C parser) 2) you can easily debug the heuristic C parser by having a human-readable output available from that parser 3) you could allow "merging" some XML from the C parser with some override XML that corrects ambiguities or remaps some names Anyway, however you go about it, D-BUS is a pretty good use-case to look at; the main two use cases would be remoting (as in D-BUS) and language bindings (as in gnome-python). Some patches for dbus-glib have recently been posted that implement some of the introspection stuff. Of course, the D-BUS case is simplified by the assumption each object will be written explicitly for D-BUS, so we don't try to handle any random C function signature that might exist. Something like foo (MyObject *obj, RandomStruct **bar, OtherRandomStruct baz) would just not be introspectable with the way we're doing D-BUS. Probably that's true for GLib general case too but I guess in the general case one might want to try a little harder, to support a language binding for all of GTK which includes many legacy interfaces.
I'm in the process of learning Mono/C#, and it has one concept I like the better the more I see it used, that is attributes. Attributes are kinda like properties, but are specifically meant for introspection only, and don't directly affect "normal" running of object. All attributes inherit from System.Attribute, and are thus proper object themselves, and support creation of arbitrary custom attributes classes. CLI allows for marking anything with attributes (where anything means "whatever you can introspect"), like classes, methods, fields, etc. Attributes can also take construct-time keyword parameters. What is really cool about attributes is that they neatly solve what was traditionally done by various hacks a'la magic method names (like tests for JUnit have to be named like TestSomething, in NUnit you mark them with attribute NUnit.TestAttribute instead, similarly you mark objects meant to be serializable by runtime with System.SerializableAttribute, nothing else is needed). Having attributes available would be perfect for things like designating D-BUS methods, and also would allow for incorporating some of manual overrides and corrections directly into typelib.
Discussing that further with Rob on IRC, we came to the problem of how to get metadata for classes that are not instantiated. Particularly in more convoluted cases, like class G defined in one place referencing its parent class F defined somewhere else and not registered (yet). One solution is of course to leave things as they are now, and only support introspection of registered classes, but we could also try to enhance contract and try to provide that additional info in form of packages/modules/jars/assemblies (pick your favorite term). This way we _could_ give that info, provided all requisites are in place, and not break existing contract for old users. The exact realization of that info is probably dependent and left up for implementation, but can be embedded directly in DSO - ELF can use some magic sections, a.out magic variable, and PE I have no clue. Someone more knowledgeable should comment here (Mike?) On a semi-related note, the metadata should probably contain info about namespaces - ie, that GtkWidget really lives in Gtk.Widget. That's important for autogenerating correct bindings in languages that have real OO support. And it could be implemented using attributes :).
Any introspection system is going to need to work with the existing introspection capabilities in libgobject. If the introspection info is hung off the GType for the class, then we can already find the parent class with g_type_parent(). If the introspection info is separate, then we'd still need some way to find the associated GType in order to make use of the information (if we can't even do that, then the info isn't very useful ...). Similarly we'd need some way to find the information associated with an arbitrary GType. So again we would be able to find information about the parent using g_type_parent().
Yes. Assemblies are complementary to existing system - they provide means to give some info (like where is given class implemented), to allow for metadata retrieval without registering types. Old methods obviously need to (and will) work, that's what I meant by enhancing the contract without breaking it for existing clients.
as havoc suggested, i wrote a front-end to eat xml descriptions (http://arcsin.org/temp/metadata/ if you'd like to see it). it's a straight translation of the .defs format, though there are still a few limitations. i'm currently working on getting it to spit out the actual binary data. the type information will probably be split up in a couple pieces: a blob existing in the library (which will give the _get_type functions for classes, all the type information for non-gtype types, etc), and the gobject introspection info which will stay in the class structure with the properties and signals currently. a set of apis will abstract all of this so we can play with the format as necessary. i'm not exactly sure how this will fit into the build process since some of the information goes inside the gobject and some outside.
Created attachment 45175 [details] my prototype Here is my current prototype. It is rather unfinished, and needs a lot more work, but I need to get it out here so that interested people can look at it and maybe help completing it.
Created attachment 46000 [details] [review] Supplemental patch to let boxed types have a constructor.
Ok, I have imported the prototype in cvs now, module gobject-introspection
Created attachment 46552 [details] [review] Patch to it compile with restrictive compiler options Here's a patch that removes unused variables and such to make it compile when using restrictive compiler flags.
Comment on attachment 46552 [details] [review] Patch to it compile with restrictive compiler options Patch looks fine. Can you commit it ?
No, sorry, I don't have commit rights.
I have commited proposal for typed annotations, metadata-annotations-proposal.txt. Please read it and comment. There's a bunch of FIXME's too, which will need addressing before it can go in.
Was Torsten's patch ever commited?
Not completely it seems. I do have commit rights now, so I'll look into it and commit the approved parts.
I've picked up Matthias prototype and finished it with plenty of help from others. More information can be found http://live.gnome.org/GObjectIntrospection and in the tarball we released. We should either close this bug or changing the descripting into integrating GObjectIntrospection into glib itself.
I'm closing this bug now as we have a mechanism in gobject-introspection which provides a (more) complete API description. For inclusion of g-i in glib, another bug should be filed.
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]