GNOME Bugzilla – Bug 581525
Struct members not detected with #typedef after struct declaration
Last modified: 2015-02-07 16:47:05 UTC
I've a little piece of code where members of struct ShellActivity were not detected. Moving the > #typedef ShellActivity ShellActivity line above the struct declaration solved this (see attached files to test). You can check that the files in tests/everything/ always put the #typedef first, so they don't check that detail. Now, you must acknowledge that's a really vicious bug: how many hours of my life it made me spend in debugging! ;-)
Created attachment 134070 [details] Header file, with relevant parts Don't bother about ShellActivities (plural).
Created attachment 134071 [details] Source file coming with it (only if you want to build them and test)
This issue applies to atk as well (specifically, AtkPropertyValues and AtkRegistry). I've also seen it in thrift_c_glib.
Many hours spent here too ... struct is also not detected if the typedef surrounds the structure declaration, ala: typedef struct _SpaceMan { gchar *string; gint value; } SpaceMan; Attaching a test case for this. In the test case output gir, you'll see that the structure contains the wrong name (the one with the underscore prefix).
Created attachment 197680 [details] Test case
I believe the same issue also appears in GIRepository. _GIArgument appears in the gir (as _Argument) instead of GIArgument (as Argument), although it is a union not a struct.
Also lost many hours on this one with a slightly different arrangement. With a boxed type, having the typedef declared after the struct gives a GIR result using "glib:boxed" whereas if it is before, we get a "record". Using Stef's example: typedef struct _SpaceWoman SpaceWoman; struct _SpaceWoman { gint value; }; Gives a GIR result of something like: <record name="Woman" c:type="SpaceWoman" glib:type-name="SpaceWoman" glib:get-type="space_woman_get_type" c:symbol-prefix="woman"> <field name="value" writable="1"> ... Whereas defining the typedef later as with either of the following: typedef struct _SpaceMan { gint value; } SpaceMan; or struct _SpaceMan { gint value; }; typedef struct _SpaceMan SpaceMan; Yields the following GIR: <glib:boxed glib:name="Man" c:symbol-prefix="space" glib:type-name="SpaceMan" glib:get-type="space_man_get_type"> <!-- No field are given in this case! --> I'm not really sure which is "correct" but glib:boxed gives us a GI_INFO_TYPE_BOXED which is not really supported in PyGObject. The results with "record" give us GI_INFO_TYPE_STRUCT which works well and looks like it figures out the boxedness from type checking with something like "..._get_type().is_kind(G_TYPE_BOXED)". https://git.gnome.org/browse/pygobject/tree/gi/module.py?id=3.11.2#n186
Actually in the later example we get two different things in the GIR: <glib:boxed glib:name="Man" c:symbol-prefix="space" glib:type-name="SpaceMan" glib:get-type="space_man_get_type"> <!-- No field are given in this case! --> ... <record name="_Man" c:type="_SpaceMan"> <field name="value" writable="1"> ...
Created attachment 264461 [details] [review] g-ir-scanner: Add tests for various struct typedef combinations Add functional blackbox tests typedefs.[h|c] for struct typedefs. Include currently failing structs surrounded with __GI_SCANNER__ check and a pre-processor warning. This gives a "target" for functionality that should work since there is no "expected failure" concept for the g-ir-scanner diff tests.
Created attachment 264462 [details] [review] SourceScanner: Fix get_symbols/comments to maintain the scanner lists Use g_slist_copy prior to returning the lists run through g_slist_reverse. This preserves the source scanners internally held lists where previously they would only point to a single element after a call, leaking memory and breaking subsequent calls. Note the functions as (transfer container) and use g_slist_free after calls in the Python bindings. Add new unittest file: test_sourcescanner.py for isolated unittesting of the SourceScanner.
(In reply to comment #10) > Created an attachment (id=264462) [details] [review] > SourceScanner: Fix get_symbols/comments to maintain the scanner lists Note that this patch doesn't fix anything in a normal scanner run (except a memory leak) since get_symbols and get_comments only seem to be called once. The problem was found debugging the SourceScanner using dump() in various places which would then cause the rest of the run to behave unexpectedly.
Created attachment 264465 [details] [review] SourceScanner: Fix get_symbols/comments to maintain the scanner lists Removed debug printing
Created attachment 264793 [details] [review] tests: Add tests for various struct typedef combinations Rebased on patches from bug 720713. This gives the ability to use XFAIL_TESTS.
Created attachment 264794 [details] [review] scanner: Fix get_symbols/comments to maintain the scanner lists Rebased on patches from bug 720713. This allows for isolated testing of the Python tests: cd tests/scanner make check TESTS=test_sourcescanner TESTARGS=Test.test_get_symbols_length_consistency Use g_slist_copy prior to returning the lists run through g_slist_reverse. This preserves the source scanners internally held lists where previously they would only point to a single element after a call, leaking memory and breaking subsequent calls. Note the functions as (transfer container) and use g_slist_free after calls in the Python bindings. Add new unittest file: test_sourcescanner.py for isolated unittesting of the SourceScanner.
Created attachment 264962 [details] [review] tests: Add tests for various struct typedef combinations Rebased
Created attachment 264963 [details] [review] scanner: Fix get_symbols/comments to maintain the scanner lists Rebased
Created attachment 264964 [details] [review] tests: Add transformer typedef struct tests Add tests for various combinations of struct tags and typedefs. This includes tests for orderings of a typedef prior to struct definition, reversed, all in one, and multiple typdefs. The currently failing tests have been marked as "expectedFailure" so the test suite doesn't bail out.
Created attachment 264965 [details] [review] scanner: Fix parsing for various typedef struct orderings Add structs parsed from C's "tag namespace" into their own cache for lookup by typdef parsing. This fixes issues where a typedef declared after a struct would not have a correct name (name pased on the struct tag not the typdef). This also cleans up the need for special casing struct tags prefixed with an underscore.
Created attachment 264966 [details] [review] tests: Add nested struct tests for transformer Add tests for nested structs as they pass through the transformer.
Created attachment 264967 [details] [review] scanner: Use struct tag namespace parsing for nested structs Re-use _create_tag_ns_struct for parsing of nested structs. This adds a "member" argument for specialized behaviour.
Created attachment 264968 [details] [review] tests: Add nested union unittests Add a sub-set of tests found in struct testing as only basic validation is needed due to the shared code paths.
Created attachment 264969 [details] [review] scanner: Use tag namespace for parsing unions Generalize _create_tag_ns_struct for both structs and unions.
Created attachment 264970 [details] [review] tests: Add transformer tests for callback typedefs Add a basic transformer unittest for callback typedefs.
Created attachment 264971 [details] [review] scanner: Remove typedef namespace cache Remove the caching of typedefs in Transformer._typedef_ns. This is no longer used due to the added _tag_ns cache which store tags rather than typedefs. Remove adding of callback typdefs to the typedef_ns since these were not being used anyhow.
Summary: The patches use a testing methodology which isolate the Transformer postconditions in an effort to avoid conflation by other parts of the system. This also has the benefit of easier debugging because a breakpoint can be set in a specific test or a specific test can be run in isolation using TESTS and TEST_ARGS. The solution I used was to attempt to follow the C tag namespacing semantics. Instead of adding typedefs into a special cache (_typedef_ns) we instead always add typedefs to the main namespace and create an empty shared Record which is also added to the tag namespace (_tag_ns) keyed by the tag name (in situations like "typedef struct _Foo Foo"). Later when the struct is parsed ("struct _Foo {...}") we fill out the struct found in the tag namespace and referenced by the typedef in the main namespace. The reverse ordering of this parsing also works which is what fixes the problems described in this bug. Furthermore, this technique allows us to remove special checking of underscore prefixed struct tags because struct tags can now have any name. Beyond making sure distcheck passes, I ran diffs on GLib and GTK+ generated GIRs with the following results: Changes in GIRepository-2.0.gir: * _Argument renamed to Argument * BaseInfo now has all of its fields exposed * Functions updated with introspectable=1 - callable_info_invoke - constant_info_get_value - field_info_get_field Changes in Gtk-2.0.gir: * _RcProperty renamed to RcProperty with added methods: - parse_border - parse_color - parse_enum - parse_flags - parse_requisition * global functions like gtk_rc_property_parse_border still exist but have an additional annotation: moved-to="RcProperty.parse_border"
Adding this as dependent upon bug 720713 due to tests using the testing harness changes over there.
Review of attachment 264962 [details] [review]: Looks good. ::: tests/scanner/typedefs.c @@ +10,3 @@ +typedefs_boxed_with_typedef_before_unref(TypedefsBoxedWithTypedefBefore *self) +{ +} Too bad we don't have https://bugzilla.gnome.org/show_bug.cgi?id=678619 =/
Review of attachment 264963 [details] [review]: Oh wow, that's an evil bug. Patch looks correct.
Review of attachment 264964 [details] [review]: This is OK, although I'm uncertain of the value of this style of testing versus something a bit more like the "expected gir" approach. One thing we used to have is "xpath assertions" tests, where instead of storing the entire .gir in git, we just had like: /namespace/struct/@name="TestStruct"/@disguised=1 (Or something like that, my xpath is rusty). Anyways, I'm certainly not going to bounce a lot of carefully written tests, but the above is just something to think about.
Review of attachment 264965 [details] [review]: ::: giscanner/transformer.py @@ +94,3 @@ continue node = self._traverse_one(symbol) + if node and node.name and node.name not in self._namespace.names: With this we will no longer get "Namespace conflict" assertions; instead we will just silently ignore the duplicate names. There seem to be two changes here: 1) Ignore nodes with name = None 2) Silently drop duplicates Do you only need 1) ? If you need 2), I think we should change the specific calling sites to say e.g.: self._append_new_node(node, ignore_duplicates=True) @@ -749,3 @@ message.warn_symbol(symbol, e) return None - struct = ast.Record(name, symbol.ident, disguised=disguised) All of this additional new code applies to unions as well, right? I guess unions are used far less frequently, but still, this code should be a common helper logic shared between the two. (You don't have to do that in this patch series, but if you choose not to, please add a # FIXME or something to do the code) @@ +768,3 @@ + else: + tag_name = None + Why the use of the term "tag" here? That's a term strongly associated with the git usage for me. How about "struct_lookaside"? Or "pending_structs"? (If you unify this code with the union case, then "compound_lookaside") @@ +779,3 @@ + # primary typedef that has been promoted, we create a new Record + # which is forced as a disguised struct. + struct = ast.Record(name, symbol.ident, disguised=True) Hmmm...is this disguised struct useful for anything? Perhaps we should just ignore the second typedef? I.e. struct = None, then below: if struct is not None: ... @@ +785,3 @@ + # promote it to the main namespace. Essentially the first + # typedef for a struct clobbers its name and ctype which is what + # will be visible to GI. Right, this is the core fix; makes sense to me. @@ +799,3 @@ + # Force the struct as disguised for now since we do not yet know + # if it has a body. + struct.disguised = True Mmm...that's not what disguised is supposed to be, although I think it is likely broken in g-i now. What you're saying is that we don't know if the struct has any fields - but that is best represented simply by struct.fields == [], right? "disguised" was supposed to be for things like GdkAtom where the typedef includes the '*' so it's not obviously a structure. Why are you setting disguised here? @@ +805,3 @@ + # we need to parse the fields because we never get a struct + # in the tag namespace which is normally where fields are parsed. + self._parse_fields(symbol, struct) ...instead of parsing the fields like in this case? @@ +816,3 @@ + else: + struct = ast.Record(None, symbol.ident, disguised=False) + self._tag_ns[symbol.ident] = struct So basically the first time we see a struct, we put it in the lookaside. Ok. But it seems to me this should *only* happen for toplevel structures. This function can also be called for nested structures, and we'll end up having the Record object be both in the fields[] of another Record, and in the lookaside. Maybe that's not a real-world issue.
Review of attachment 264966 [details] [review]: Sure.
Review of attachment 264967 [details] [review]: Ah hah, this patch addresses my concerns about the nested behavior of the previous patch. Ok, just one cosmetic comment: ::: giscanner/transformer.py @@ +820,3 @@ + else: + # This will be an anonymous struct if there is no ident + struct = ast.Record(None, ident, disguised=False) Since disguised=False is the default, let's just drop that keyword?
Review of attachment 264968 [details] [review]: Ok.
Review of attachment 264969 [details] [review]: Ah, this patch addresses my comments about unions. Can you just squash this into the previous patch? I'd do so before looking at any of my other comments so as to avoid a conflict-fest.
Review of attachment 264970 [details] [review]: ::: tests/scanner/test_transformer.py @@ +431,3 @@ + self.namespace = ast.Namespace('Test', '1.0') + logger = MessageLogger.get(namespace=self.namespace) + logger.enable_warnings((WARNING, ERROR, FATAL)) Should consider having a superclass with this code at this point.
Review of attachment 264971 [details] [review]: Hm, but do we lose support for the "disguised" bits now? I guess that's handled by your previous patches. So I perhaps retract my earlier comments about disguised. Looking through our -expected.girs, the usage of it there seems mostly bogus.
Attachment 264962 [details] pushed as 4a64ab0 - tests: Add tests for various struct typedef combinations Attachment 264963 [details] pushed as e6249ad - scanner: Fix get_symbols/comments to maintain the scanner lists Attachment 264964 [details] pushed as 4800d30 - tests: Add transformer typedef struct tests
Attachment 264966 [details] pushed as a36a218 - tests: Add nested struct tests for transformer Attachment 264968 [details] pushed as 3189b3c - tests: Add nested union unittests Attachment 264970 [details] pushed as 802a048 - tests: Add transformer tests for callback typedefs
Review of attachment 264965 [details] [review]: ::: giscanner/transformer.py @@ +94,3 @@ continue node = self._traverse_one(symbol) + if node and node.name and node.name not in self._namespace.names: 2) was an attempt to avoid doing this: def _create_typedef_struct(self, symbol): ... self._tag_ns[symbol.ident] = struct return None Where the create function has the side effect of adding the object to an alternate namespace and then returning None as a clue it shouldn't be added to the main namespace. In the new code, the idea was to avoid this knowledge in the individual create functions. Leaving the decision as to what namespace the object is added to up to the outer level parse function. An alternative would be to do an identity check within _append_new_node (ignore nodes of the same name AND identity: original is node). The real issue this solves is that _create_typedef_struct cannot really know if it should be added to the main namespace because the struct may have been created previously by a pure struct prior to the typedef (the problem this ticket describes): struct _Foo {...}; typedef struct _Foo Foo; It could be changed to have those smarts within the create function: def _create_typedef_struct(self, symbol): ... self._tag_ns[symbol.ident] = struct if struct.name not in self._namespace: return struct else: return None But to me, it is even more fragile and I would advocate for dumbing down the create functions as much as possible. Taking this further, the addition of objects to the tag namespace should also follow suite: def parse(self, symbols): for symbol in symbols: try: node = self._traverse_one(symbol) except TransformerError as e: ... if node.name: self._append_new_node(node) if node.ident and isinstance(node, ast.Compound): self._tag_ns[node.ident] = node This way _traverse_one should always return a valid object and raise an exception if an error occurred. Leaving "parse" as the orchestrator of what goes in the various namespaces. I'm also not particularly fond of any of this as it is only a first pass attempt to structure the code in a less fragile way (and fix a bug at the same time). More importantly we now have more tests so it can continually be refactored with a bit more confidence. @@ +768,3 @@ + else: + tag_name = None + The term came from reading a bit about C namespaces. There were enough articles which used it that I started using it. References: http://www.dmst.aueb.gr/dds/cscout/doc/name.html http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fnamspac.htm The problem is "struct name" is overloaded because we use that to mean the stripped typedef name in GI terms. To be clear we have this: typedef struct _StructTagNsName StructTypedefName; Symbol.ident == StructTypedefName Symbol.base_type.name == _StructTagNsName Record.ident == _StructTagNsName Record.name == TypedefName and: struct _StructTagNsName {...}; Symbol.ident == _StructTagNsName Record.name == None Record.ident == _StructTagNsName If that isn't convincing, I'm happy to go with "lookaside". In any case, I think all of this warrants a bit of documentation at the top of the Python modules describing the meaning of the nomenclature as it relates to an example like above. Because it seems almost anything we use is going to be ambiguous or overloaded to a new reader. @@ +779,3 @@ + # primary typedef that has been promoted, we create a new Record + # which is forced as a disguised struct. + if tag_name in self._tag_ns: Yep, in the early implementations I played with using an Alias instead of a new empty struct. But found out it would change things in various GIRs. For example GInitiallyUnowned is declared as: typedef struct _GObject GObject; typedef struct _GObject GInitiallyUnowned; This is defined as a separate struct in GObject-2.0.gir. In fact, its fields are copied later as a special case in the dump parser: https://git.gnome.org/browse/gobject-introspection/tree/giscanner/gdumpparser.py?id=1.39.0#n330 Another option I played with was adding field sharing code in addition to creating a new struct: newstruct.fields = struct.fields This way, regardless of when or where fields are parsed, both Records would share them (this would also allow us to remove the GInitallyUnowned special case). But that changes too much stuff for these patches. I really wanted to isolate only what should be fixed and keep everything else exactly the same, and perhaps change it later if it makes sense. Perhaps I can just amend the comment noting that this could be different in the future, or at least reference the gdumpparser.py example. @@ +799,3 @@ + # Force the struct as disguised for now since we do not yet know + # if it has a body. + else: Yep, I was going by the definition of disguised based on the current output of GIRs. So regardless of whether it is correct or not, we shouldn't change those results just yet. @@ +816,3 @@ + else: + struct = ast.Record(None, symbol.ident, disguised=False) + else: This patch only addresses toplevel structures (_create_tag_ns_struct is only called for toplevel structs), nested are addressed in the patch: "Use struct tag namespace parsing for nested structs" Nested struct parsing always goes through _traverse_one->_create_member->_create_struct The above mentioned patch adds the "member" argument and nested struct parsing then becomes: _traverse_one->_create_member->_create_tag_ns_struct(..., member=True) Looking at this again, I don't think any of it is particularly clear in terms of naming or functional breakdown. Perhaps nested structs should have their own isolated function "_create_nested_struct"... So then we'd have: _create_typedef_struct _create_tag_ns_struct _create_nested_struct
Ugh, that last comment was supposed to be replies to the review and only make sense when looking at it with the review tool: https://bugzilla.gnome.org/review?bug=581525&attachment=264965
Correction, we don't actually store the struct tag names on the Record, we only push them into _tag_ns as the key... so we actually have this: typedef struct _StructTagNsName StructTypedefName; Symbol.ident == StructTypedefName Symbol.base_type.name == _StructTagNsName Record.name == TypedefName _tag_ns[_StructTagNsName] == Record and: struct _StructTagNsName {...}; Symbol.ident == _StructTagNsName Record.name == None _tag_ns[_StructTagNsName] == Record So we should probably store that on the Record as: Record.tag_name (or Record.lookaside_name)
(In reply to comment #39) > An alternative would be to do an identity > check within _append_new_node If that works, it sounds quite simple. > But to me, it is even more fragile and I would advocate for dumbing down the > create functions as much as possible. At a high level, I *do* care that we continue to signal namespace conflicts. If say you declare: typedef struct _MyFoo MyFoo; typedef union _MyFoo MyFoo; Ok, the C compiler would disallow that one, but there are cases where it's allowed in C, but not in GIR. > The term came from reading a bit about C namespaces. There were enough articles > which used it that I started using it. References: > http://www.dmst.aueb.gr/dds/cscout/doc/name.html > http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fnamspac.htm Interesting. I was not aware of that. Well...I have a mild preference for "lookaside", but it's not a strong feeling. Please feel free to keep it or change as you see fit. > In any case, I > think all of this warrants a bit of documentation at the top of the Python > modules Yeah, just a comment above _tag_ns would be good. Doesn't have to be long. Could even just be a sentence + link to this Bugzilla. (Yes, "git annotate" can find this type of thing, but it can be annoying to do after refactoring). So executive summary: Let's just figure out some way to continue to signal namespace conflicts, and I'm OK with the patch.
Created attachment 265268 [details] [review] scanner: Fix parsing for various typedef struct orderings Add structs parsed from C's "tag namespace" into their own cache for lookup by typdef parsing. This fixes issues where a typedef declared after a struct would not have a correct name. This also cleans up the need for special casing struct tags prefixed with an underscore.
Created attachment 265269 [details] [review] scanner: Add simplified parsing for nested structs Add _create_member_struct for the parsing of nested structs. This is precursory work to remove the member/anonymous flag from other struct/union creation code and allow simplification of those code paths.
Created attachment 265270 [details] [review] scanner: Use tag namespace for parsing unions Generalize _create_tag_ns_struct for both structs and unions. Remove _create_compound newer struct parsing code has completely replaced it.
Created attachment 265271 [details] [review] scanner: Remove typedef namespace cache Remove the caching of typedefs in Transformer._typedef_ns. This is no longer used due to the added _tag_ns cache which store tags rather than typedefs. Remove adding of callback typdefs to the typedef_ns since these were not being used anyhow.
scanner: Fix parsing for various typedef struct orderings https://bugzilla.gnome.org/review?bug=581525&attachment=265268 This patch no longer ignores duplicate named nodes added to the namespace and instead adds an identity check directly in _append_new_node. This still gives an error when there are different nodes with the same name. Additionally cleaned out the struct creation side effect of adding themselves to the tag namespace by moving the logic to the outer level parse function. This uses a new "tag_name" attribute on ast.Compound and cleans up the create functions. scanner: Add simplified parsing for nested structs https://bugzilla.gnome.org/review?bug=581525&attachment=265269 This is a little bit different than the last set of patches in that it adds a more explicit "_create_member_struct" rather than using a parameter to the other struct creation functions which makes logic much simpler. The remaining patches are just rebases which were already accepted.
Created attachment 265275 [details] [review] scanner: Cleanup misuse of disguised attribute to mean private Remove logic dealing with a misuse of the disguised Node attribute to mean private in various struct creation functions. Only apply the disguised attribute directly when parsing a typedef in the form of: typedef struct _Foo* Foo; Note: This changes GIRs a bit and also adds a few new pages of mallard docs. So it is unclear if this is ok.
Created attachment 265280 [details] [review] scanner: Replace GInitiallyUnowned field sharing with generic solution Remove GInitiallyUnowned special case in gdumpparser where fields are copied from GObject. Add generic solution where anytime we have multiple typedef structs, the fields become shared: typedef struct _Foo Foo; typedef struct _Foo Bar; struct _Foo {...}; Notes: I ran through all of the important GIRs (GLib, GObject, Gio, Gtk, Gdk) and noticed no changes (GInitiallyUnowned still contains all the fields from GObject)
Review of attachment 265268 [details] [review]: Looks good.
Review of attachment 265269 [details] [review]: Okay.
Review of attachment 265270 [details] [review]: Looks right at a quick glance.
Review of attachment 265271 [details] [review]: Glad this code will be gone.
Review of attachment 265275 [details] [review]: Can we do this as a separate bug? I need to have time to dig through the history of disguised and figure out what we should do here.
Review of attachment 265280 [details] [review]: Looks good.
Attachment 265268 [details] pushed as 9d8d159 - scanner: Fix parsing for various typedef struct orderings Attachment 265269 [details] pushed as dae7e62 - scanner: Add simplified parsing for nested structs Attachment 265270 [details] pushed as 70eb2cf - scanner: Use tag namespace for parsing unions Attachment 265271 [details] pushed as e9e74c1 - scanner: Remove typedef namespace cache Attachment 265280 [details] pushed as e6fc4c1 - scanner: Replace GInitiallyUnowned field sharing with generic solution
Comment on attachment 265275 [details] [review] scanner: Cleanup misuse of disguised attribute to mean private Moved patch over to bug 721481
*** Bug 667186 has been marked as a duplicate of this bug. ***
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]