GNOME Bugzilla – Bug 172459
xml2po should mark not-translated passages
Last modified: 2019-03-25 23:13:37 UTC
xml2po should mark passages that are not (yet) translated like <para>Toller deutscher Text</para> <para lang="C">Untranslated Passage</para> This way xsl stylesheets can hide these passages or just display them in a different color.
This would also cause auto-generated text to be generated in English in those blocks. So if, in the middle of a German document, you had this: <figure lang="C"> </figure> The label on the output would be "Figure 2", not "Abbildung 2". This is the generally accepted behavoir for DocBook processing tools in mixed language documents.
I requested this feature to be able to build a site similar to http://www.apple.com/support/keynote/ from the docbook. In this situation we have a tradeoff between having "Figure 2" in german texts as long as <figure> is not translated and being able to add a link to "translate this paragraph" for untranslated sections. My personal opinion is to be able to distinguish between translated and not translated.
Created attachment 39623 [details] [review] Handle simpler cases of untranslated strings (incorrect patch) This is not very easy to do with gettext library (because it returns untranslated string from gettext() call if there's no translation, but *without* any indication that it's untranslated; since some strings might actually stay same in the translation as in the original, simply checking if translation == original doesn't work). This is a patch which does such check (translation == original), but as I said, this is incorrect, and I'm not planning on applying it as it is. If you try to run such example on eg. Greek release notes translation (try "cd gnome-doc-utils/xml2po/tests/relnotes/ && ./test.sh"), you'll see that even those application names which are intentionally translated the same will get "lang='C'". Another problem with this patch is that it's highly DocBook-specific (use of "lang" attribute instead of eg. "xml:lang"). Also note that it's very hard to choose what tag should actually be marked with a "lang" (and specifically, I doubt <figure> will ever be, since its' content is never going to be directly translated as one message, but more likely, a <phrase> tag inside <textobject> would get marked; <para>, otoh, will be marked appropriately). However, I'm planning on switching xml2po to my own PO file handling library (to add many more features in fuzzy matching, such as better algorithms and word-by-word diffs), so this will be possible to detect as soon as I do the switch.
Sven, it took me less than a year to implement this: am I fast or what? ;) [and it turned out to be much easier than I anticipated] Try the "--mark-untranslated" option in g-d-u/xml2po CVS HEAD.