GNOME Bugzilla – Bug 784177
gtk-doc produces indeterministic output
Last modified: 2017-06-28 19:22:57 UTC
Created attachment 354408 [details] example gstreamer package diff when working on reproducible builds for openSUSE I found that when building gstreamer and vte packages in openSUSE they contained unexplainable diffs in .html pages generated by gtk-doc during the build process. I did a dozen such builds for gstreamer and there seem to be 2 bits of entropy that means, sometimes it is possible to get the same result twice, but it is only a 25% chance 1 bit is if gst_byte_reader_peek_data_unchecked has const flag or not 1 bit is if gstreamer-libs-GstByteWriter.html contains a gst-byte-writer-put-int8.returns section by default we use make -j$NUMPROCESSORS but this bug also occurs without parallel builds
Looking at diffs of unpublished files, I found that docs/gst/gstreamer-decl.txt docs/libs/gstreamer-libs-decl.txt docs/plugins/gstreamer-plugins-decl.txt had a lot of random ordering of entries probably because they are generated from input in (random) filesystem order. and within the mess I found in gstreamer-libs-decl.txt <FUNCTION> -<NAME>gst_byte_reader_peek_data_unchecked</NAME> -<RETURNS>const guint8 * </RETURNS> -GstByteReader * reader +<NAME>gst_byte_reader_peek_data_unchecked</NAME> +<RETURNS>const guint8 *</RETURNS> +const GstByteReader * reader ... <FUNCTION> -<NAME>gst_byte_reader_peek_data_unchecked</NAME> -<RETURNS>const guint8 *</RETURNS> -const GstByteReader * reader +<NAME>gst_byte_reader_peek_data_unchecked</NAME> +<RETURNS>const guint8 * </RETURNS> +GstByteReader * reader so this entry seems to be there twice, differing in just one space and maybe depending on which one comes first, there is 0 or 1 'const' in the output html There are also plenty warnings from gtk-doc in the build log https://build.opensuse.org/public/build/openSUSE:Factory/standard/x86_64/gstreamer/_log
gtk-doc actually has a test that verifies bit-exact html output, but this does not cover the intermediate files. Are you packaging files like gstreamer-libs-decl.txt? Anyway, I'll see that also the intermediate files are covered with tests.
no, we do not package gstreamer-libs-decl.txt I was just looking into the build file system for possible reasons why the final html differed in such interesting ways. and thereby found the two conflicting entries of gst_byte_reader_peek_data_unchecked that seem to be selected depending on order in the input one with const and one without.
commit 5a1cbafa5d41d55baafb1b0611c58bd8bf1c66e3 (HEAD -> master) Author: Stefan Sauer <ensonic@users.sf.net> Date: Tue Jun 27 07:44:17 2017 +0200 docs: add a missing const in bytereader docs This syncs the prototype with gstbytereader.h I did a few build and the ./docs/libs/gstreamer-libs-decl.txt seems stable. Will check ~20 build now and update the ticket afterwards.
I am running this script with gtk-doc from git (ported to python) inside gstreamer/docs and get stables docs for 10 tries. I am closing the bug now, please reopen if needed. #!/bin/bash num=10 for x in $(seq 0 ${num}); do echo "== build $x ==" make clean all cp ./gst/gstreamer-decl.txt ~/temp/gstreamer-decl.txt.$x cp ./libs/gstreamer-libs-decl.txt ~/temp/gstreamer-libs-decl.txt.$x cp ./plugins/gstreamer-plugins-decl.txt ~/temp/gstreamer-plugins-decl.txt.$x done for x in $(seq 1 ${num}); do echo "== check $x ==" diff -s ~/temp/gstreamer-decl.txt.0 ~/temp/gstreamer-decl.txt.$x diff -s ~/temp/gstreamer-libs-decl.txt.0 ~/temp/gstreamer-libs-decl.txt.$x diff -s ~/temp/gstreamer-plugins-decl.txt.0 ~/temp/gstreamer-plugins-decl.txt.$x done
Did you try with using a fresh copy of gstreamer dir on each try? Because when you re-use the existing one, the readdir order might remain the same, because the input files are not re-created in the filesystem.
Created attachment 354613 [details] [review] patch to fix this
I must admit that it was a lot harder to reproduce this than I thought. In the end, it only differed when building on different machines (which we do for openSUSE). strace showed that gtkdoc-scan opened .h files in varying order which is fixed by attached patch - though maybe it only needs parts of those changes.
Thanks, that makes a lot of sense. Are you able to provide a git-formatted patch (make changes; git add <changed-files>; git commit; git format-patch origin/master)? The patch does not apply for me and I would otherwise recreate it and attribute it to "Bernhard M. Wiedemann <gnomebmw@lsmod.de>".
Created attachment 354626 [details] [review] fixing patch it seems, those scripts have been rewritten since our gtk-doc-1.25 from perl to python, but the same logic applies
The following fixes have been pushed: 81d8e36 tools: sort directory listings bb30dcd test: remove duplicated doc blob
Created attachment 354641 [details] [review] tools: sort directory listings The ordering matters to be able to generate reproducible results. See also https://reproducible-builds.org/docs/stable-inputs/ on that topic Fixes:
Created attachment 354642 [details] [review] test: remove duplicated doc blob