GNOME Bugzilla – Bug 97061
Glade shouldn't put unnecessary markup in messages for translation
Last modified: 2009-10-02 16:39:58 UTC
Glade shouldn't put unnecessary markup in the messages that are marked for translation by translators. This is one case of "don't mark things for translation that shouldn't be translated". There cases where markup occurs in messages can be divided into two types. The first type is messages similar to this example: msgid "This text is <b>bold</b>." In this case, the markup contains important positional information. Only the text "bold" should be bold, and a translation needs to take that into account, so the markup actually carrys important information to the translator here. This use of markup is important and a necessity. The other case where markup is used is where the entire message or entire paragraphs or headings are surrounded with markup, like in these examples: msgid "<b>Home Page Preferences</b>" msgid "<span size=\"medium\"><b>No file</b></span>" msgid "" "<span weight=\"bold\" size=\"larger\">What do you want to do with this " "file?\n" "</span>\n" "It's not possible to view this file type directly in the browser:" In this type of messages, the markup contains no relevant information to the translator, since all the translatable content is embedded in the markup (in the last example above, the message could just as well have been split into two separate messages, so this still applies). Instead, these messages are just a nuisance and create lots of extra and totally unnecessary work for the translators. Whenever the markup for a message should change the slightest, all translations will have to be updated, even though no translatable content changed. Whenever a new message is added that has surrounding markup, even if the exact same message without this exact markup was translated before, the message will have to be "translated" again. Usually all this adds up, and it's not uncommon to have a situation like with all these examples occurring in the same po file: msgid "<b>Home Page Preferences</b>" msgid "<i>Home Page Preferences</i>" msgid "<span size=\"larger"><b>Home Page Preferences</b></span>" In short, every possible combination of the same actual message but with different and irrelevant markup surrounding it will have to be "translated" separately, instead of just one "Home Page Preferences" message. It's not just a nuisance and a lot of unnecessary work that's slowing the translation process. Sometimes the surrounding markup adds much more text than the actual message, and that confuses gettext's fuzzy-matching so that it considers the message an entirely new message and doesn't fyzzy-mark it with a previous similar translation, or that it fuzzy-matches on the markup instead of on the actual message. This can cause consistency problems in the translation, where the same terminology won't be used since fuzzy-matching didn't work properly. Fuzzy-matching is an important time-saver and important for consistent use of terminology across translations, and when it doesn't work properly, it affects consistency in a negative way. The solution to these problems is to try to separate markup from gettext calls, so that the markup isn't passed through _(). This doesn't apply to the first case of messages mentioned above, but certainly for the second type of messages. In the case of the examples above, they should be rewritten so that they appear in the po file like this: msgid "Home Page Preferences" msgid "No file" msgid "What do you want to do with this file?" msgid "It's not possible to view this file type directly in the browser:" Perhaps Glade could solve this by allowing the author to use some messages attributes rather than including the markup in the actual message.
*** Bug 73339 has been marked as a duplicate of this bug. ***
Is there a similar bug for libglade/xml-i18n-tools? For apps using libglade, this bug needs to be handled by those 2, and they both need to agree on what to do, so it is a bit tricky. See also bug 100012, which may be a better way to solve this issue.
Kenneth, how's the intltool situation on this?
*** Bug 100012 has been marked as a duplicate of this bug. ***
I thought a bit about supporting the attributes property, but I don't think that will work very well. Setting a PangoAttrList is mutually exclusive with using markup or underlines in the string, so I don't think we can use it. It would also make the properties page more complicated.
Besides that problems that Damon mentioned, how is a label supposed to "fit" the attributes for the translated string? Let's say that you have in english: "How are <b>you</b> doing?" And in spanish: "Como esta <b>usted</b>?" If we use the attributes instead of embedding the markup, somebody has to figure out that the attribute b has changed its indexes, and somebody has to say to what are the new values. That seems harder to solve from a i18n point of view than just exposing the marked string on the po file.
The idea is to use markup only when a substring is marked up, and use attributes when the entire label is bold/italic Rather than generic PangoAttrList support what I'd suggest in glade would be just "bold" and "italic" checkboxes, perhaps also a "size" option menu that let you choose a xx-small -> xx-large scale factor.
This bug is one of the most serious issues w.r.t. GNOME translation right now, as discussed on GUADEC and in other forums, so it should probably have higher priority.
If this bug even more difficult to fix in libglade? Should there be a separate issue opened there?
Ok, så richard@imendio.com pointed out that vicious-extensions has this in glade-helper.[ch]: /* Make label surrounded by tag (if tag = "tag" add <tag>text</tag>) */ void glade_helper_tagify_label (GladeXML *xml, const char *name, const char *tag); Perhaps usable as a temporary workaround for this glade/libglade problem.
I think there are 2 approaches here: 1) Use special properties for bold/italic/size, both in the XML file and probably in the Glade property editor. Advantages: It doesn't require that libglade/intltool are updated in sync. intltool wouldn't need to be changed at all. It doesn't break current translated strings, but they will need to be converted to use the new properties, and then the translations will need to be updated. Disadvantages: It uses non-standard properties. We have been trying to move to GObject properties as much as possible, so this is a step backwards. It only supports a limited number of attributes - bold, italic and size. It won't help when others are used. 2) Leave the XML file as it is and get Glade/libglade/intltool to strip tags from the start and end of strings before passing to gettext. Advantages: No non-standard properties. Supports all attributes. Disadvantages: libglade & intltool must be updated to pass the stripped strings to gettext at the same time. It loses a lot of translations, as the strings will change. I'm not sure which I prefer. I'm leaning towards (1) now but I don't really like the use of non-standard properties. Actually (2) may have a problem with backwards compatability. If we update libglade to strip tags, old apps won't find their translated strings (since they will have used the old intltool that didn't strip tags).
If it is worth having bold, italic, etc properties, then I really think it is worth adding them as real properties in the underlying libraries (surely they would be useful outside of glade constructed dialogs, right?). Whatever the solution is, it probably should directly set the attribute list rather than setting the string to use markup. I've seen many bits of code like: markup = g_strconcat("<b>", _("foo"), "</b>", NULL); or: markup = g_strdup_printf("<b>%s</b>", something); which work well most of the time, but fail if the substituted string isn't valid markup. Adding more ways to make this sort of error doesn't sound like a good thing.
I have started to look into this and have a libglade patch to support the "attributes" property on GtkLabel. Attributes are deserialized from a Pango markup string, with the special feature that the attributes are taken to be global (ie apply to range 0-MAXINT) if the text in the markup string is just a single space,eg <property name="attributes"><b><i> </i><b></property> <property name="label" translatable="yes">I'm bold and italic</property> My plan to support this in glade would be to let the ui unchanged, and when saving the "label" property, parse the markup, determine whether it is only global, and if yes, save the markup separately. When loading, the "attributes" and "label" properties will have to be merged again. Open questions - This approach only deals with labels. Is this enough, or are there other common offenders which would need to be fixed ? (message dialog come to mind) - Do you think the outlined approach of handling this in glade is reasonable ? - We probably need a project flag to enable this attribute splitting, otherwise opening and saving a file in glade could inadvertedly cause massive string changes.
I don't really like the format of your attributes property. It looks like a bit too much of a hack. I'd rather have fake properties like this: <property name="font_weight">bold</property> <property name="font_style">italic</property> <property name="font_size">large</property> Also beware of markup like: "<b>this is bold</b> this is plain <b>this is also bold</b>" If you don't parse the entire markup you may wrongly assume that the entire string is bold. The advantage of the fake properties above is that we don't need to parse the markup and we don't need the flag to switch over to the new format. Users will have to manually switch over the strings, but I don't think there are too many. Regarding updating the Glade code, I think you only need to worry about the GtkLabel and GtkAccelLabel widgets for now.
Created attachment 23456 [details] [review] attributes property for libglade
The plan for glade would have been to use pango_parse_markup, and analyse the resulting PangoAttrList, so that it would have handled your example correctly. Regarding "bold", etc pseudoproperties, I believe there is a little problem here: - the glade maintainer prefers pseudo-property "bold" - the libglade maintainer says: if a "bold" property, it should be a real GTK property - GTK already has a general "attributes" property, so "bold" would be a hard sell Would it be acceptable to use "weight"/"style"/"variant" etc pseudoproperties in the glade ui, but serialize/deserialize them into an "attributes" string when loading/saving a glade file ?
I think your patch is quite good. Markup is easier to parse than I thought. But I still think that the format in the XML file is too messy: <property name="attributes"><b><i> </i><b></property> That is not very readable when compared to: <property name="font_weight">bold</property> <property name="font_style">italic</property> Your "attributes" property isn't a real GTK+ property either - it isn't much better than the fake properties. I'd prefer the fake properties but I don't really mind which way we decide. If we wait for GTK+ to add the properties it may never be fixed, so I think we should choose one. James - do you have any preference?
> But I still think that the format in the XML file is too > messy: Well, compare to <property name="label"><b><i> your text here </i><b></property> emitted by current glade. I agree that it is a bit messy, but (occasionally ugly) serialization is just the price we have to pay for selecting a textual format instead of a binary one. An alternative would be <property name="attributes"><![CDATA[<b><i> </i></b>]]></property> > Your "attributes" property isn't a real GTK+ property either - it > isn't much better than the fake properties. I have to disagree with this. There is an "attributes" property on the label of type PangoAttrList. So this is as much a real attribute as, e.g. any integer or float property, for which you also have to deserialize from strings in libglade, since gtk+ won't do it. One could perhaps hide the deserialization a bit better, using GValue transformation... I'd also like to mention that the "attributes" property is quite a bit more general and extensible than the pseudo property approach, since it supports everything that can be expressed in Pango markup and will automatically support future extensions to it.
I've just spotted another problem - I'm not sure "attributes" can be used together with an underlined access key. Can you check that? I think we must support access keys. In gtk_label_recalculate() it seems that if an access key is used attributes aren't used. See also gtk_label_set_pattern_internal() where it overrides any existing attributes. So it doesn't look like we can use the "attributes" property, unless I'm mistaken.
Yes, you are right. It is either use (markup and/or underline) or use (attributes and/or mnemonic_keyval) So what we would need here would be a new attribute "global_attributes" which would be merged with the attributes from "attributes" or markup.
Is this going to be fixed in glade2 or is it on the TODO for glade3? Just curious.
I doubt it will be fixed in glade2, unless someone else does it. I'm not sure the glade3 people know about the issue. But I've been doing a bit of work on glade3 and might even get around to this myself.
Thanks a lot Damon!
I had a go at fixing this in a different way by changing only libtool. It relies on the fact that most markup is of the form "<tag>text</tag>". When the "strip-markup" is enabled, intltool adds an extra line to every tranlatable string describing the surrounding tags if any i.e. #: foo.h:60 #FORMAT '' #FORMAT '<b>' #FORMAT '<span size="larger"><i>' msgid "A string" The strings are then unmerged before being passed to msgfmt. This has the advantage that: It requires only changes to intltool. It won't break any existing translations (it has to be explicitly enabled)
The above looks like it will still allow translators to shoot themselves in the foot if they put special characters like '&', '<' and '>' in the translation. Although if you are looking at modifying the i18n tools here, you might want to look at whether it is possible to integrate with gettext's format string vulnerability checker (invoked with msgfmt --check-format). Currently it is designed to make sure that translated format strings match the untranslated format string. It probably wouldn't be too difficult to make it check that a translated string contains valid pango markup. Something that would catch things like this: #, pango-markup msgid "<b>Hello</b>" msgstr "<bold>Hi&"
Dropping priority to normal. This probably won't be fixed in glade-2. It only affects a small number of strings, so I'm not sure it is worth the effort anyway.
moving this bug to the glade3 product, since I saw it mentioned on gtk-devel-list.
Wrote a patch for GtkBuilder support of pango attribute parsing. Marking this bug to depend on my patch on bug 527486
Ping! Maybe we can have a fix before GNOME 2.24? Translators will be really glad (pun intended :)
Well, the attributes parsing is in gtk+, I think 2.24 will ship a new gtk+ ... on the glade side of things - were probably not making 3.6 for gnome 2.24 - we have some tidying up to do - maybe we'll be there minus builder support for menus. I have an editor with a treeview written for the pango attributes, not in svn yet, no load/save support yet... note I made a little announcement about us probably not making 2.24: http://mail.gnome.org/archives/desktop-devel-list/2008-July/msg00095.html
err why didnt I close this ? anyway, we will have attribute parsing and editors for GNOME 2.26, its in Glade/GTK+ trunk. Closing bug.
Hello, nice to hear that the bug is fixed, but I don't see the fix ... When I have a string in glade like <b>Bold Text</b> intltool-extract still writes something like char *s = N_("<b>Bold Text</b>"); into the .h file. In the .pot file from xgettext a translator has to translate the string: #: ui.glade.h:12 msgid "<b>Bold Text</b>" msgstr "" I'd rather translate #: ui.glade.h:12 msgid "Bold Text" msgstr "" and let glade/gettext/gtkbuilder figure out the "boldness" themselves. What am I doing wrong?
Hi. The fix is that now you dont need to use markup in Glade to set text attributes, now the GtkLabel will parse <attributes> in the builder format (and Glade provides a dialog to set the attributes on the label).