GNOME Bugzilla – Bug 688897
Fix GTK-Doc comment block parsing.
Last modified: 2015-02-07 16:56:47 UTC
Current annotationparser.py gets close, but not close enough to correctly parsing (or rather scanning?) the GTK-Doc comment block syntax. Including, but not limited to, deprecated GTK-Doc and g-i specific syntax. Patches (and a whole lot of tests) will be attached shortly in an attempt to fix this.
Created attachment 229659 [details] [review] 0001-Update-GLib-annotations-to-master.patch
Created attachment 229660 [details] [review] 0002-tests-add-g-ir-doc-tool-generated-output-to-.gitigno.patch
Created attachment 229661 [details] [review] 0003-giscanner-improve-wording-of-inline-documentation.patch
Created attachment 229662 [details] [review] 0004-giscanner-use-if-in-a-b-instead-of-if-a-or-if-b.patch
Created attachment 229663 [details] [review] 0005-giscanner-use-dict.items.patch
Created attachment 229664 [details] [review] 0006-giscanner-use-dict.values-in-favor-of-dict.itervalue.patch
Created attachment 229665 [details] [review] 0007-giscanner-use-collections.OrderedDict-when-available.patch
Created attachment 229666 [details] [review] 0008-giscanner-remove-duplicate-os-import.patch
Created attachment 229667 [details] [review] 0009-giscanner-remove-unused-variables.patch
Created attachment 229668 [details] [review] 0010-giscanner-implement-DocOption-in-terms-of-odict-inst.patch
Created attachment 229669 [details] [review] 0011-giscanner-make-it-clear-DocOptions-also-has-a-positi.patch
Created attachment 229670 [details] [review] 0012-giscanner-drop-dead-code.patch
Created attachment 229671 [details] [review] 0013-giscanner-use-re.match-instead-of-re.search.patch
Created attachment 229672 [details] [review] 0014-giscanner-add-AnnotationParser-tests.patch
Created attachment 229673 [details] [review] 0015-giscanner-move-unit-tests-from-giscanner-annotationp.patch
Created attachment 229674 [details] [review] 0016-giscanner-construct-list-of-possible-tag-names-for-T.patch
Created attachment 229675 [details] [review] 0017-giscanner-treat-the-GTK-Doc-Description-tag-like-any.patch
Created attachment 229676 [details] [review] 0018-giscanner-Correctly-detect-invalid-GTK-Doc-comment-b.patch
Created attachment 229677 [details] [review] 0019-giscanner-remove-re.MULTILINE-usage-from-annotationp.patch
Created attachment 229678 [details] [review] 0020-giscanner-update-annotationparser-to-most-recent-gtk.patch
Created attachment 229679 [details] [review] 0021-giscanner-fix-DocBlock-.comment.patch
Created attachment 229680 [details] [review] 0022-giscanner-don-t-continue-parsing-after-multiline-des.patch
This concludes the first series of patches setting the stage for the more serious work in what will become a future patch series[1]. To ease development, I've edited my ~/.jhbuildrc as follows: module_autogenargs['gobject-introspection'] = autogenargs + ' --enable-gtk-doc --enable-doctool' branches['gobject-introspection'] = ('http://github.com/dieterv/gobject-introspection.git', 'annotationparser-wave2') Step 1: Testing procedure ran for each single one of these: $ git clean -e .cproject -e .project -e .pydevproject -e .settings -fdx $ jhbuild buildone gobject-introspection $ jhbuild make --check $ jhbuild shell $ (rm -f giscanner/*.pyc && git checkout -- gir/ && cd misc && ./update-glib-annotations.py /home/dieterv/gnome.org/checkout/glib/) $ git diff # ensure glib annotations in gir/*.c have not changed, meaning # if a patch has caused changes to DockBlock.to_gtk_doc() output, # the changes in questio are already included with said patch. $ exit 0001-Update-GLib-annotations-to-master.patch 0002-tests-add-g-ir-doc-tool-generated-output-to-.gitigno.patch 0003-giscanner-improve-wording-of-inline-documentation.patch 0004-giscanner-use-if-in-a-b-instead-of-if-a-or-if-b.patch 0005-giscanner-use-dict.items.patch 0006-giscanner-use-dict.values-in-favor-of-dict.itervalue.patch 0007-giscanner-use-collections.OrderedDict-when-available.patch 0008-giscanner-remove-duplicate-os-import.patch 0009-giscanner-remove-unused-variables.patch 0010-giscanner-implement-DocOption-in-terms-of-odict-inst.patch 0011-giscanner-make-it-clear-DocOptions-also-has-a-positi.patch 0012-giscanner-drop-dead-code.patch 0013-giscanner-use-re.match-instead-of-re.search.patch Step 2: Starting with this patch I'm also testing against Python 3 in addition to Python 2. The Testing procedure now looks like this: $ git clean -e .cproject -e .project -e .pydevproject -e .settings -fdx $ jhbuild buildone gobject-introspection $ jhbuild make --check $ PYTHONPATH=. python3 tests/scanner/annotationparser/test_parser.py $ jhbuild shell $ (rm -f giscanner/*.pyc && git checkout -- gir/ && cd misc && ./update-glib-annotations.py /home/dieterv/gnome.org/checkout/glib/) $ git diff # ensure glib annotations in gir/*.c have not changed, meaning # if a patch has caused changes to DockBlock.to_gtk_doc() output, # the changes in questio are already included with said patch. $ exit 0014-giscanner-add-AnnotationParser-tests.patch Part 3: Pattern tests have moved so the procedure now looks like this (this continues to apply for the next series of patches that will be posted in the coming days): $ git clean -e .cproject -e .project -e .pydevproject -e .settings -fdx $ jhbuild buildone gobject-introspection $ jhbuild make --check $ PYTHONPATH=. python3 tests/scanner/annotationparser/test_patterns.py && PYTHONPATH=. python3 tests/scanner/annotationparser/test_parser.py $ jhbuild shell $ (rm -f giscanner/*.pyc && git checkout -- gir/ && cd misc && ./update-glib-annotations.py /home/dieterv/gnome.org/checkout/glib/) $ git diff # ensure glib annotations in gir/*.c have not changed, meaning # if a patch has caused changes to DockBlock.to_gtk_doc() output, # the changes in questio are already included with said patch. $ exit 0015-giscanner-move-unit-tests-from-giscanner-annotationp.patch 0016-giscanner-construct-list-of-possible-tag-names-for-T.patch 0017-giscanner-treat-the-GTK-Doc-Description-tag-like-any.patch 0018-giscanner-Correctly-detect-invalid-GTK-Doc-comment-b.patch 0019-giscanner-remove-re.MULTILINE-usage-from-annotationp.patch 0020-giscanner-update-annotationparser-to-most-recent-gtk.patch 0021-giscanner-fix-DocBlock-.comment.patch 0022-giscanner-don-t-continue-parsing-after-multiline-des.patch [1] if you're interested, you can track development of the rest on: https://github.com/dieterv/gobject-introspection/commits/annotationparser-wave2
Also note this series is somewhat time sensitive. There is a chance that if somebody updates GLib annotations after g-i commit 17fc978c081195dad1f6a66d69cc6e18423e6db5, the following will need to be redone via interactive rebase as those have an effect on the output of misc/update-glib-annotations.py: 0001-Update-GLib-annotations-to-master.patch 0010-giscanner-implement-DocOption-in-terms-of-odict-inst.patch 0021-giscanner-fix-DocBlock-.comment.patch
Review of attachment 229659 [details] [review]: OK.
Review of attachment 229660 [details] [review]: OK.
Review of attachment 229661 [details] [review]: One minor typo: ::: giscanner/annotationparser.py @@ +599,3 @@ def _parse_comment_block(self, comment_lines, filename, lineno): """ + Parses a single GTK-Doc comment block already stripped from it's "its" here
Review of attachment 229662 [details] [review]: Sure.
Review of attachment 229663 [details] [review]: The commit message should say *why*. Is this to improve Python 3 compatibility? If it is, say so. On this particular topic, it's annoying that being 2 and 3 compatible by using .items() causes a full list copy on Python 2, but oh well...we have bigger performance problems.
Review of attachment 229664 [details] [review]: If this is for Python 3, say so. Otherwise, OK.
Review of attachment 229665 [details] [review]: Ok.
Review of attachment 229666 [details] [review]: Ok...if we're going for "pylint" style tools we probably have a lot of this.
Review of attachment 229667 [details] [review]: Ok.
Review of attachment 229668 [details] [review]: Makes sense.
Review of attachment 229669 [details] [review]: Ok.
Review of attachment 229670 [details] [review]: Ok.
Review of attachment 229671 [details] [review]: Agreed on this one.
Review of attachment 229672 [details] [review]: This patch is useful, but I personally tend to much prefer "black box" testing. These tests are "white box" because they are making assertions about the current internal structure of g-ir-scanner. That's not to say white box tests are universally bad; some things are hard to test in a black box fashion. The specific tradeoff with these tests is that: 1) There are a lot of comment blocks that we could just as well run through the whole g-ir-scanner, and use them to test *other* things too. 2) If we want to change the internal structure of the annotation parser, it will involve a lot of tedious changes to the XML. It needs to be documented how to make such a change. How did you generate the XML in the first place? Just manually run test_parser.py with an empty XML and extract it from the diff? I'm marking this as accepted-commit-now, since I don't want to block progress on further work.
Review of attachment 229673 [details] [review]: Again, commit message should say *why*. I'm guessing it's just clearer that it's a test case? If so, say "Make it clearer it's a test" or something.
Review of attachment 229674 [details] [review]: Nice. I hit this duplicatation a while ago and was very confused why my new tag wasn't being parsed.
Review of attachment 229675 [details] [review]: Ok, I couldn't find any docs about the Description tag in a quick search, so trusting you on this one.
Review of attachment 229676 [details] [review]: What do we do in this case? Just ignore the whole thing? Seems like it could use a warning?
Review of attachment 229677 [details] [review]: Ok.
Review of attachment 229678 [details] [review]: Makes sense.
Review of attachment 229679 [details] [review]: Ok.
Review of attachment 229680 [details] [review]: No tests for this one...can you at least give an example case that was broken?
Created attachment 230118 [details] [review] 0003-giscanner-improve-wording-of-inline-documentation.patch Finished one other sentence with a ".".
(In reply to comment #47) > Created an attachment (id=230118) [details] [review] > 0003-giscanner-improve-wording-of-inline-documentation.patch > > Finished one other sentence with a ".". And fixes comment #27.
Created attachment 230119 [details] [review] 0005-giscanner-use-dict.items.patch Adds the *why* as remarked in comment #29.
Created attachment 230120 [details] [review] 0006-giscanner-use-dict.values-in-favor-of-dict.itervalue.patch Adds the *why* as remarked in comment #30.
(In reply to comment #38) > 1) There are a lot of comment blocks that we could just as well run through the > whole g-ir-scanner, and use them to test *other* things too. Most are non-sense comment blocks designed to test the output of annotationparser. The tests don't care how we arrived at said output, as long as it's correct. From a certain point of view, these are blackbox test if you consider annotationparser.py to be a standalone thing exporting a kind of API contract to it's users, which happen to be other modules of the giscanner package ;) > 2) If we want to change the internal structure of the annotation parser, it > will involve a lot of tedious changes to the XML. True, but that case implies huge changes in the GTK-Doc comment block format, so you're already in a world of pain anyway. > It needs to be > documented how to make such a change. The patch adds tests/scanner/annotationparser/README. Suggestions on how to improve it most appreciated :) > How did you generate the XML in the > first place? Just manually run test_parser.py with an empty XML > and extract it from the diff? All of it hand written. I do admit that the diffing was *extremely* useful in finding bugs, both in annotationparser.py and in the hand-written tests. > I'm marking this as accepted-commit-now, since I don't want to block progress > on further work. Thanks :)
Created attachment 230125 [details] [review] 0015-giscanner-move-unit-tests-from-giscanner-annotationp.patch Adds the *why* as remarked in comment #39.
(In reply to comment #40) > Review of attachment 229674 [details] [review]: > > Nice. I hit this duplicatation a while ago and was very confused why my new > tag wasn't being parsed. Thanks. Do note the goal of all this is for g-i *not* to add Tags: ever again but (annotations) on the identifier instead.
(In reply to comment #41) > Review of attachment 229675 [details] [review]: > > Ok, I couldn't find any docs about the Description tag in a quick search, so > trusting you on this one. Thanks. For histories' sake and just to have it on record, here it is in all it's glory: http://git.gnome.org/browse/gtk-doc/tree/gtkdoc-mkdb.in?id=f6fe57bfe4c9751cd307c37d306cac42432d1431#n3920
(In reply to comment #42) > Review of attachment 229676 [details] [review]: > > What do we do in this case? Just ignore the whole thing? > > Seems like it could use a warning? Not really. Previously we'd get an empty DocBlock instance for those comment blocks as added in the tests, which implies a GTK-Doc block must exist that prompted the creation of that instance. This makes sure that we behave in a consistent way with what happens for COMMENT_START_RE, ie return None...
(In reply to comment #46) > Review of attachment 229680 [details] [review]: > > No tests for this one...can you at least give an example case that was broken? It doesn't change anything, really, except that you don't have to read a whole page of code just to realize that nothing else happens with that line and we go on processing the next. Putting the continue statements there makes it a bit more readable.
21:22 <walters> dieterv: if you've addressed the comments, the whole branch is accepted-commit-now then as far as i'm concerned 21:27 <dieterv> walters: done 21:28 <dieterv> walters: comments #55 en #56 need your reply because you set 0018-giscanner-Correctly-detect-invalid-GTK-Doc-comment-b.patch and 0022-giscanner-don-t-continue-parsing-after-multiline-des.patch to reviewed only 21:29 <dieterv> walters: the other comments have been addressed and I'll add the bugzilla link once everything is good to go 21:30 <walters> dieterv: those comments are fine by me 21:30 <dieterv> walters: ok, thanks :)
Created attachment 243125 [details] [review] scanner: fix GTK-Doc comment block parsing - GTK-Doc and current g-ir-scanner do not allow tags to be written before the comment block description but old version of g-ir-scanner parsed that just file. Fix this regression, but emit when tags are encountered at the wrong location. #674658 - During this analysis, a secondary bug report was filed with the GTK-Doc project because it was found that the same malformed tags did not result in the expected html output. After discussing this with the GTK-Doc maintainers, we learned that our g-i specific top level tags should never have existed in the first place. The prefered notation for annotations that apply to the identifier should be written on the identifier line, for example like what we already do with (skip). As a result, this patch deprecates g-i specific top level tags and implements them as annotations on the identifier instead and at the same time adds support for malformed comment blocks still using g-i specific top level tags. This means that all annotated code "out there" will continue to work just fine with this version of g-i, but when a developer decides to fix deprecation warnings in his/her comment blocks, the dependency on g-i needs to be raised to a version that contains at least this patch. #676133 - I previously (and foolishly) removed parentheses matching with a regex, which off course is not possible when there are multiple nested levels of parens (which we will now see more with annotations on the identifier, for example with the type annotation). Remove the "annotation parens matching" regexes again and add in more warnings on unbalanced/unexpected parentheses. #688897 https://bugzilla.gnome.org/show_bug.cgi?id=674658 https://bugzilla.gnome.org/show_bug.cgi?id=676133
(In reply to comment #58) > Created an attachment (id=243125) [details] [review] > scanner: fix GTK-Doc comment block parsing 1) Together with this patch, https://live.gnome.org/DieterVerfaillie/Annotations is intended to replace https://live.gnome.org/GObjectIntrospection/Annotations possibly with a link to the old version of the page for g-i versions < 1.37.0 2) This patch changes how Deprecated: and Since: tag GTK-Doc description fields are stored in .gir files, without changing the .gir version (as requested by Colin Walters). This broke vala during development, which has since been hacked up to work. We need to ping ensonic so he can fix this properly if still needed and maybe look if something needs to be done to valadoc too. 3) Modules that have choosen to pass --warn-error to g-ir-scanner during their build process have a chance of no longer building due to the new warings that have been introduced. I have not yet found the time to jhbuild everything from scratch yet again and generate patches for those modules though... So, the deprecation of g-i specific Tags: and the newly introduced warnings probably make this a GNOME 3.10 only patch. I don't plan to backport this to whatever g-i versions come with GNOME 3.8.
Created attachment 243126 [details] [review] tests: validate GTK-Doc test files This is more of a handy developer tool than something that is always expected to be executed as part of the test suite, so we don't add a hard dependency on xmllint.
Created attachment 243127 [details] [review] docs: document some annotationparser.py hacking hints
Created attachment 243128 [details] [review] Add giscanner package documentation Using Sphinx [1] markup for now. We eventually might want to add GTK-Doc support for python and dogfood g-ir-doctool on this package. [1] De facto standard Python documentaton tool: http://sphinx-doc.org
(In reply to comment #62) > Created an attachment (id=243128) [details] [review] > Add giscanner package documentation This has helped me keep my sanity while working on this the past year or so. Maybe it can help a future contributor find his or her way too.
Review of attachment 243126 [details] [review]: OK.
Review of attachment 243127 [details] [review]: ::: HACKING @@ +34,3 @@ + When satisfied, commit these files adding "Update GLib annotations to master" as the + commit message. + 4) Only now start editing annotatinoparser.py. typo
Review of attachment 243128 [details] [review]: ++
Review of attachment 243125 [details] [review]: There are a lot of unrelated changes in here that I would prefer to split out. Can you split the annotation changes in the regression tests into another patch? Can you split the options => annotations rename into another patch? ::: giscanner/collections/counter.py @@ +33,3 @@ + + class Counter(dict): + '''Dict subclass for counting hashable items. Sometimes called a bag If you copied this from somewhere (Python sources?) it would be nice to have an attribution at the top. ::: giscanner/introspectablepass.py @@ -60,3 @@ context = "return value: " if block: - return_tag = block.get_tag(TAG_RETURNS) I have the get_tag removal around here somewhere. I wrote it for some docs work. I'll attach it here. ::: giscanner/message.py @@ +196,1 @@ +def error(text, positions=None, prefix=None): So, uh, just curious. Can we use the standard Python logging module?
Created attachment 243260 [details] [review] annotationparser: Remove get_tag/get_param They're useless if we can just access the dict directly.
Created attachment 243511 [details] [review] annotationparser: Remove get_tag/get_param They're useless if we can just access the dict directly.
Review of attachment 243511 [details] [review]: Seems reasonable to me.
Comment on attachment 243511 [details] [review] annotationparser: Remove get_tag/get_param Attachment 243511 [details] pushed as a018552 - annotationparser: Remove get_tag/get_param
(In reply to comment #67) > Review of attachment 243125 [details] [review]: > > There are a lot of unrelated changes in here that I would prefer to split out. branch with split out commits was merged yesterday, thanks Colin and Jasper for putting up with my ramblings for so long :)
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]