Bug 554854 – The --typelib-xml option should reuse giscanner parser/writer

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 554854 - The --typelib-xml option should reuse giscanner parser/writer


Summary:	The --typelib-xml option should reuse giscanner parser/writer


Status:	RESOLVED FIXED

Product:	gobject-introspection
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gobject-introspection Maintainer(s)
QA Contact:	gobject-introspection Maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:	555296 555602

Reported:	2008-10-03 13:13 UTC by Johan (not receiving bugmail) Dahlin
Modified:	2015-02-07 16:49 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
patch to use ElementPath instead of minixpath (6.31 KB, patch) 2008-10-04 04:27 UTC, Colin Walters	rejected	Details \| Review
GIRParser+GIRWriter love (14.33 KB, patch) 2008-10-08 19:54 UTC, Lucas Rocha	none	Details \| Review
Update patch (14.34 KB, patch) 2008-10-09 15:23 UTC, Lucas Rocha	committed	Details \| Review

Description Johan (not receiving bugmail) Dahlin 2008-10-03 13:13:07 UTC

This is currently the code we use for doing the typelib stripping:

def typelib_xml_strip(path):
    from giscanner.girparser import C_NS
    c_ns_key = '{%s}' % (C_NS, )

    from xml.etree.cElementTree import parse
    doc = parse(open(path))
    for node in doc.getiterator():
        for attrib in list(node.attrib):
            if attrib.startswith(c_ns_key):
                del node.attrib[attrib]
    doc.write(sys.stdout)
    return 0

While simple, it has a number of problems:
- formatting is different from normal output
- namespacing is wrong ns0 is inserted everywhere.

Instead the girparser and girwriter should be reused to do that.
girparser.py will need quite a bit of love before a a parser+writer roundtrip will produce somewhat identical markup.

Comment 1 Johan (not receiving bugmail) Dahlin 2008-10-03 13:13:52 UTC

Lucas tells me that this also affects the --inject option.

Comment 2 Colin Walters 2008-10-04 04:27:44 UTC

Created attachment 119902 [details] [review]
patch to use ElementPath instead of minixpath

Ok, the patch to replace minixpath with ElementPath is trivial.

Comment 3 Lucas Rocha 2008-10-06 20:56:57 UTC

(In reply to comment #2)
> Created an attachment (id=119902) [edit]
> patch to use ElementPath instead of minixpath
> 
> Ok, the patch to replace minixpath with ElementPath is trivial.
> 

I guess this patch should go in bug #554857?

Comment 4 Johan (not receiving bugmail) Dahlin 2008-10-07 10:59:43 UTC

Indeed, rejecting formally.

Comment 5 Lucas Rocha 2008-10-08 19:54:03 UTC

Created attachment 120225 [details] [review]
GIRParser+GIRWriter love

Here's an initial patch to make the use of GIRParser+GIRWriter actually possible. I could successfully make injections with this patch.

Comment 6 Johan (not receiving bugmail) Dahlin 2008-10-08 20:05:23 UTC

Comment on attachment 120225 [details] [review]
GIRParser+GIRWriter love

Looks good! Thanks for doing this work.

Just a couple of minor nitpicks:

>diff --git a/giscanner/girwriter.py b/giscanner/girwriter.py
>     def _write_enum(self, enum):
[..]
>+            attrs.extend([('c:type', enum.symbol)])

attrs.append() would be better here.

>     def _write_field(self, field):
>         # FIXME: Just function
>-        if isinstance(field, (Callback, Function)):
>+        if isinstance(field, Function):

The FIXME can be removed now right?

>diff --git a/giscanner/xmlwriter.py b/giscanner/xmlwriter.py

>-                raise ValueError(
>-                    "value for attribute %r cannot be None" % (attr, ))
>+                continue

>         for attr, value in attributes:
>+            if value is None:
>+                continue

>-            if value is None:
>-                raise ValueError(
>-                    "value for attribute %r cannot be None" % (attr, ))

Can you explain why this is needed?
None shouldn't be passed in and it can be argued that they are bugs.
You might be right that None should just be silently ignored, but I just like to
know :-)

>diff --git a/tools/g-ir-scanner b/tools/g-ir-scanner

> def typelib_xml_strip(path):
>+    from giscanner.girparser import GIRParser as Parser
>+    from giscanner.girwriter import GIRWriter as Writer

No need to do the as part here, it's only used in the other
part of the code since we might support different writers.

>     for node in doc.getiterator():
>         for attrib in list(node.attrib):
>             if attrib.startswith(c_ns_key):
>                 del node.attrib[attrib]

Can this transformation be done using the nodes in the GIRParser tree? 
It would be a bit faster to avoid having to reparse, but perhaps
it's not a very big deal since it's not a common operation.

> def inject(path, additions, outpath):
>+    from giscanner.girparser import GIRParser as Parser
>+    from giscanner.girwriter import GIRWriter as Writer

Ditto

>     injectDoc = parse(open(additions))
>     for node in injectDoc.getroot():
>         injectPath = node.attrib['path']
>@@ -128,8 +139,14 @@ def inject(path, additions, outpath):
>             raise ValueError("Couldn't find path %r" % (injectPath, ))
>         for child in node:
>             target.append(child)

Ditto.

Comment 7 Lucas Rocha 2008-10-09 15:22:06 UTC

(In reply to comment #6)
> (From update of attachment 120225 [details] [review] [edit])
> Looks good! Thanks for doing this work.
> 
> Just a couple of minor nitpicks:
> 
> >diff --git a/giscanner/girwriter.py b/giscanner/girwriter.py
> >     def _write_enum(self, enum):
> [..]
> >+            attrs.extend([('c:type', enum.symbol)])
> 
> attrs.append() would be better here.

Fixed.

> >     def _write_field(self, field):
> >         # FIXME: Just function
> >-        if isinstance(field, (Callback, Function)):
> >+        if isinstance(field, Function):
> 
> The FIXME can be removed now right?

Yeah, removed.

> >diff --git a/giscanner/xmlwriter.py b/giscanner/xmlwriter.py
> 
> >-                raise ValueError(
> >-                    "value for attribute %r cannot be None" % (attr, ))
> >+                continue
> 
> >         for attr, value in attributes:
> >+            if value is None:
> >+                continue
> 
> >-            if value is None:
> >-                raise ValueError(
> >-                    "value for attribute %r cannot be None" % (attr, ))
> 
> Can you explain why this is needed?
> None shouldn't be passed in and it can be argued that they are bugs.
> You might be right that None should just be silently ignored, but I just like
> to
> know :-)

Basically, there were some enum and members with no type-name or get-type defined in the xml so when we would write from namespace to xml, there were some attributes with None as value. For that problem, the reason is that GIRParser is unconditionally parsing all nodes as glib ast nodes (GLib) when in some cases they should be just ast nodes (this is a separate bug maybe?). Another problem is when using --typelib-xml, we strip some attributes from nodes that are expected by GIRWriter.

The way I see it, the XMLWriter shouldn't care about the "meaning" of the xml nodes. It should just write them and ignore empty attributes. GIRWriter is the layer that should care if there's something important missing. For example, if GIRWriter was to write a GLibEnum that is missing a type-name for some reason, we should raise an error in that case. When GIRWriter if fully working in that way, we can probably put back the empty attribute errors in XMLWriter as a way to make sure we're doing the right thing.
 
> >diff --git a/tools/g-ir-scanner b/tools/g-ir-scanner
> 
> > def typelib_xml_strip(path):
> >+    from giscanner.girparser import GIRParser as Parser
> >+    from giscanner.girwriter import GIRWriter as Writer
> 
> No need to do the as part here, it's only used in the other
> part of the code since we might support different writers.

Ok, fixed.

> >     for node in doc.getiterator():
> >         for attrib in list(node.attrib):
> >             if attrib.startswith(c_ns_key):
> >                 del node.attrib[attrib]
> 
> Can this transformation be done using the nodes in the GIRParser tree? 
> It would be a bit faster to avoid having to reparse, but perhaps
> it's not a very big deal since it's not a common operation.

Yeah, this is why I didn't care too much about performance here. Transforming the nodes GIRParser would add some complexity (especially in the inject operation) because we don't have any easy way to map xml nodes to namespace nodes at this moment. One much simpler solution would be to have a constructor parameter that tells GIRParser to not parse the xml at construction time. We could then change the xml tree and call parse only once later. This parameter would be True by default so that we keep the current expected behavior. See updated patch.

> > def inject(path, additions, outpath):
> >+    from giscanner.girparser import GIRParser as Parser
> >+    from giscanner.girwriter import GIRWriter as Writer
> 
> Ditto

Fixed.

Comment 8 Lucas Rocha 2008-10-09 15:23:32 UTC

Created attachment 120280 [details] [review]
Update patch

Comment 9 Johan (not receiving bugmail) Dahlin 2008-10-11 18:16:36 UTC

(In reply to comment #8)
> Created an attachment (id=120280) [edit]
> Update patch
> 

This looks good with one addition:
 - add a comment to the added "if value is None" checks which explains
   my observation and it should eventually be removed once we fix
   up the parser properly.

Comment 10 Lucas Rocha 2008-10-11 18:34:33 UTC

Ok, done. Commited.

Comment 11 André Klapper 2015-02-07 16:49:41 UTC

[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]