GNOME Bugzilla – Bug 554854
The --typelib-xml option should reuse giscanner parser/writer
Last modified: 2015-02-07 16:49:41 UTC
This is currently the code we use for doing the typelib stripping: def typelib_xml_strip(path): from giscanner.girparser import C_NS c_ns_key = '{%s}' % (C_NS, ) from xml.etree.cElementTree import parse doc = parse(open(path)) for node in doc.getiterator(): for attrib in list(node.attrib): if attrib.startswith(c_ns_key): del node.attrib[attrib] doc.write(sys.stdout) return 0 While simple, it has a number of problems: - formatting is different from normal output - namespacing is wrong ns0 is inserted everywhere. Instead the girparser and girwriter should be reused to do that. girparser.py will need quite a bit of love before a a parser+writer roundtrip will produce somewhat identical markup.
Lucas tells me that this also affects the --inject option.
Created attachment 119902 [details] [review] patch to use ElementPath instead of minixpath Ok, the patch to replace minixpath with ElementPath is trivial.
(In reply to comment #2) > Created an attachment (id=119902) [edit] > patch to use ElementPath instead of minixpath > > Ok, the patch to replace minixpath with ElementPath is trivial. > I guess this patch should go in bug #554857?
Indeed, rejecting formally.
Created attachment 120225 [details] [review] GIRParser+GIRWriter love Here's an initial patch to make the use of GIRParser+GIRWriter actually possible. I could successfully make injections with this patch.
Comment on attachment 120225 [details] [review] GIRParser+GIRWriter love Looks good! Thanks for doing this work. Just a couple of minor nitpicks: >diff --git a/giscanner/girwriter.py b/giscanner/girwriter.py > def _write_enum(self, enum): [..] >+ attrs.extend([('c:type', enum.symbol)]) attrs.append() would be better here. > def _write_field(self, field): > # FIXME: Just function >- if isinstance(field, (Callback, Function)): >+ if isinstance(field, Function): The FIXME can be removed now right? >diff --git a/giscanner/xmlwriter.py b/giscanner/xmlwriter.py >- raise ValueError( >- "value for attribute %r cannot be None" % (attr, )) >+ continue > for attr, value in attributes: >+ if value is None: >+ continue >- if value is None: >- raise ValueError( >- "value for attribute %r cannot be None" % (attr, )) Can you explain why this is needed? None shouldn't be passed in and it can be argued that they are bugs. You might be right that None should just be silently ignored, but I just like to know :-) >diff --git a/tools/g-ir-scanner b/tools/g-ir-scanner > def typelib_xml_strip(path): >+ from giscanner.girparser import GIRParser as Parser >+ from giscanner.girwriter import GIRWriter as Writer No need to do the as part here, it's only used in the other part of the code since we might support different writers. > for node in doc.getiterator(): > for attrib in list(node.attrib): > if attrib.startswith(c_ns_key): > del node.attrib[attrib] Can this transformation be done using the nodes in the GIRParser tree? It would be a bit faster to avoid having to reparse, but perhaps it's not a very big deal since it's not a common operation. > def inject(path, additions, outpath): >+ from giscanner.girparser import GIRParser as Parser >+ from giscanner.girwriter import GIRWriter as Writer Ditto > injectDoc = parse(open(additions)) > for node in injectDoc.getroot(): > injectPath = node.attrib['path'] >@@ -128,8 +139,14 @@ def inject(path, additions, outpath): > raise ValueError("Couldn't find path %r" % (injectPath, )) > for child in node: > target.append(child) Ditto.
(In reply to comment #6) > (From update of attachment 120225 [details] [review] [edit]) > Looks good! Thanks for doing this work. > > Just a couple of minor nitpicks: > > >diff --git a/giscanner/girwriter.py b/giscanner/girwriter.py > > def _write_enum(self, enum): > [..] > >+ attrs.extend([('c:type', enum.symbol)]) > > attrs.append() would be better here. Fixed. > > def _write_field(self, field): > > # FIXME: Just function > >- if isinstance(field, (Callback, Function)): > >+ if isinstance(field, Function): > > The FIXME can be removed now right? Yeah, removed. > >diff --git a/giscanner/xmlwriter.py b/giscanner/xmlwriter.py > > >- raise ValueError( > >- "value for attribute %r cannot be None" % (attr, )) > >+ continue > > > for attr, value in attributes: > >+ if value is None: > >+ continue > > >- if value is None: > >- raise ValueError( > >- "value for attribute %r cannot be None" % (attr, )) > > Can you explain why this is needed? > None shouldn't be passed in and it can be argued that they are bugs. > You might be right that None should just be silently ignored, but I just like > to > know :-) Basically, there were some enum and members with no type-name or get-type defined in the xml so when we would write from namespace to xml, there were some attributes with None as value. For that problem, the reason is that GIRParser is unconditionally parsing all nodes as glib ast nodes (GLib) when in some cases they should be just ast nodes (this is a separate bug maybe?). Another problem is when using --typelib-xml, we strip some attributes from nodes that are expected by GIRWriter. The way I see it, the XMLWriter shouldn't care about the "meaning" of the xml nodes. It should just write them and ignore empty attributes. GIRWriter is the layer that should care if there's something important missing. For example, if GIRWriter was to write a GLibEnum that is missing a type-name for some reason, we should raise an error in that case. When GIRWriter if fully working in that way, we can probably put back the empty attribute errors in XMLWriter as a way to make sure we're doing the right thing. > >diff --git a/tools/g-ir-scanner b/tools/g-ir-scanner > > > def typelib_xml_strip(path): > >+ from giscanner.girparser import GIRParser as Parser > >+ from giscanner.girwriter import GIRWriter as Writer > > No need to do the as part here, it's only used in the other > part of the code since we might support different writers. Ok, fixed. > > for node in doc.getiterator(): > > for attrib in list(node.attrib): > > if attrib.startswith(c_ns_key): > > del node.attrib[attrib] > > Can this transformation be done using the nodes in the GIRParser tree? > It would be a bit faster to avoid having to reparse, but perhaps > it's not a very big deal since it's not a common operation. Yeah, this is why I didn't care too much about performance here. Transforming the nodes GIRParser would add some complexity (especially in the inject operation) because we don't have any easy way to map xml nodes to namespace nodes at this moment. One much simpler solution would be to have a constructor parameter that tells GIRParser to not parse the xml at construction time. We could then change the xml tree and call parse only once later. This parameter would be True by default so that we keep the current expected behavior. See updated patch. > > def inject(path, additions, outpath): > >+ from giscanner.girparser import GIRParser as Parser > >+ from giscanner.girwriter import GIRWriter as Writer > > Ditto Fixed.
Created attachment 120280 [details] [review] Update patch
(In reply to comment #8) > Created an attachment (id=120280) [edit] > Update patch > This looks good with one addition: - add a comment to the added "if value is None" checks which explains my observation and it should eventually be removed once we fix up the parser properly.
Ok, done. Commited.
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]