After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 97061 - Glade shouldn't put unnecessary markup in messages for translation
Glade shouldn't put unnecessary markup in messages for translation
Status: RESOLVED FIXED
Product: glade
Classification: Applications
Component: general
git master
Other All
: Normal normal
: ---
Assigned To: Glade 3 Maintainers
Glade 3 Maintainers
: 73339 100012 (view as bug list)
Depends on: 527486
Blocks: 99659 99759 99864 99867 100035 100036 100038 100250 100252 101425 101428 101596 101760 101795 102134 102382 102384 102486 104190 112930 112962 112974 114865 118696 118963 118964 119010 119026 122805 122814 142179 150115 162806 167126 308091 498402 559885 570878
 
 
Reported: 2002-10-28 20:45 UTC by Christian Rose
Modified: 2009-10-02 16:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
attributes property for libglade (1.55 KB, patch)
2004-01-16 21:37 UTC, Matthias Clasen
needs-work Details | Review

Description Christian Rose 2002-10-28 20:45:17 UTC
Glade shouldn't put unnecessary markup in the messages that are marked for
translation by translators.

This is one case of "don't mark things for translation that shouldn't be
translated". There cases where markup occurs in messages can be divided
into two types. The first type is messages similar to this example:

   msgid "This text is <b>bold</b>."
   

In this case, the markup contains important positional information. Only
the text "bold" should be bold, and a translation needs to take that into
account, so the markup actually carrys important information to the
translator here. This use of markup is important and a necessity.

The other case where markup is used is where the entire message or entire
paragraphs or headings are surrounded with markup, like in these examples:

   msgid "<b>Home Page Preferences</b>"
   

   msgid "<span size=\"medium\"><b>No file</b></span>"
   

   msgid ""
   "<span weight=\"bold\" size=\"larger\">What do you want to do with this "
   "file?\n"
   "</span>\n"
   "It's not possible to view this file type directly in the browser:"
   

In this type of messages, the markup contains no relevant information to
the translator, since all the translatable content is embedded in the
markup (in the last example above, the message could just as well have been
split into two separate messages, so this still applies). Instead, these
messages are just a nuisance and create lots of extra and totally
unnecessary work for the translators. Whenever the markup for a message
should change the slightest, all translations will have to be updated, even
though no translatable content changed. Whenever a new message is added
that has surrounding markup, even if the exact same message without this
exact markup was translated before, the message will have to be
"translated" again. Usually all this adds up, and it's not uncommon to have
a situation like with all these examples occurring in the same po file:

   msgid "<b>Home Page Preferences</b>"
   

   msgid "<i>Home Page Preferences</i>"
   

   msgid "<span size=\"larger"><b>Home Page Preferences</b></span>"
   

In short, every possible combination of the same actual message but with
different and irrelevant markup surrounding it will have to be "translated"
separately, instead of just one "Home Page Preferences" message. It's not
just a nuisance and a lot of unnecessary work that's slowing the
translation process. Sometimes the surrounding markup adds much more text
than the actual message, and that confuses gettext's fuzzy-matching so that
it considers the message an entirely new message and doesn't fyzzy-mark it
with a previous similar translation, or that it fuzzy-matches on the markup
instead of on the actual message. This can cause consistency problems in
the translation, where the same terminology won't be used since
fuzzy-matching didn't work properly. Fuzzy-matching is an important
time-saver and important for consistent use of terminology across
translations, and when it doesn't work properly, it affects consistency in
a negative way.

The solution to these problems is to try to separate markup from gettext
calls, so that the markup isn't passed through _(). This doesn't apply to
the first case of messages mentioned above, but certainly for the second
type of messages. In the case of the examples above, they should be
rewritten so that they appear in the po file like this:

   msgid "Home Page Preferences"
   

   msgid "No file"
   

   msgid "What do you want to do with this file?"
   

   msgid "It's not possible to view this file type directly in the browser:"

Perhaps Glade could solve this by allowing the author to use some messages
attributes rather than including the markup in the actual message.
Comment 1 Christian Rose 2002-10-28 20:46:50 UTC
*** Bug 73339 has been marked as a duplicate of this bug. ***
Comment 2 Damon Chaplin 2002-12-01 15:51:20 UTC
Is there a similar bug for libglade/xml-i18n-tools?
For apps using libglade, this bug needs to be handled by those 2,
and they both need to agree on what to do, so it is a bit tricky.

See also bug 100012, which may be a better way to solve this issue.
Comment 3 Christian Rose 2002-12-01 23:15:10 UTC
Kenneth, how's the intltool situation on this?
Comment 4 Havoc Pennington 2002-12-01 23:16:03 UTC
*** Bug 100012 has been marked as a duplicate of this bug. ***
Comment 5 Damon Chaplin 2003-02-08 13:18:22 UTC
I thought a bit about supporting the attributes property, but I don't
think that will work very well. Setting a PangoAttrList is mutually
exclusive with using markup or underlines in the string, so I don't
think we can use it. It would also make the properties page more
complicated.
Comment 6 Joaquin Cuenca Abela 2003-06-25 15:31:53 UTC
Besides that problems that Damon mentioned, how is a label supposed 
to "fit" the attributes for the translated string?

Let's say that you have in english: "How are <b>you</b> doing?"
And in spanish: "Como esta <b>usted</b>?"

If we use the attributes instead of embedding the markup, somebody 
has to figure out that the attribute b has changed its indexes, and 
somebody has to say to what are the new values.

That seems harder to solve from a i18n point of view than just 
exposing the marked string on the po file.

Comment 7 Havoc Pennington 2003-06-25 16:54:22 UTC
The idea is to use markup only when a substring is marked up, and 
use attributes when the entire label is bold/italic

Rather than generic PangoAttrList support what I'd suggest in 
glade would be just "bold" and "italic" checkboxes, perhaps also 
a "size" option menu that let you choose a xx-small -> xx-large scale
factor.
Comment 8 Christian Rose 2003-06-26 14:55:19 UTC
This bug is one of the most serious issues w.r.t. GNOME translation
right now, as discussed on GUADEC and in other forums, so it should
probably have higher priority.
Comment 9 David Bolter 2003-06-28 14:14:15 UTC
If this bug even more difficult to fix in libglade?  Should there be a
separate issue opened there?
Comment 10 Christian Rose 2003-07-17 21:57:41 UTC
Ok, så richard@imendio.com pointed out that vicious-extensions has
this in glade-helper.[ch]:

/* Make label surrounded by tag (if tag = "tag" add <tag>text</tag>) */
void            glade_helper_tagify_label       (GladeXML *xml,
                                                 const char *name,
                                                 const char *tag);

Perhaps usable as a temporary workaround for this glade/libglade problem.
Comment 11 Damon Chaplin 2003-10-26 13:26:03 UTC
 
I think there are 2 approaches here:
 
1) Use special properties for bold/italic/size, both in the XML file
   and probably in the Glade property editor.
 
  Advantages:
 
   It doesn't require that libglade/intltool are updated
   in sync. intltool wouldn't need to be changed at all.
 
   It doesn't break current translated strings, but they will
   need to be converted to use the new properties, and then
   the translations will need to be updated.
 
  Disadvantages:
 
   It uses non-standard properties. We have been trying to move
   to GObject properties as much as possible, so this is a step
   backwards.
 
   It only supports a limited number of attributes - bold,
   italic and size. It won't help when others are used.
 
 
2) Leave the XML file as it is and get Glade/libglade/intltool to
   strip tags from the start and end of strings before passing to
   gettext.
 
  Advantages:
 
   No non-standard properties.
  
   Supports all attributes.
 
  Disadvantages:
 
   libglade & intltool must be updated to pass the stripped strings
   to gettext at the same time.
 
   It loses a lot of translations, as the strings will change.
 
 
I'm not sure which I prefer. I'm leaning towards (1) now but I
don't really like the use of non-standard properties.
 
Actually (2) may have a problem with backwards compatability.
If we update libglade to strip tags, old apps won't find their
translated strings (since they will have used the old intltool
that didn't strip tags).
Comment 12 James Henstridge 2003-10-27 04:19:42 UTC
If it is worth having bold, italic, etc properties, then I really
think it is worth adding them as real properties in the underlying
libraries (surely they would be useful outside of glade constructed
dialogs, right?).

Whatever the solution is, it probably should directly set the
attribute list rather than setting the string to use markup.  I've
seen many bits of code like:
    markup = g_strconcat("<b>", _("foo"), "</b>", NULL);
or:
    markup = g_strdup_printf("<b>%s</b>", something);

which work well most of the time, but fail if the substituted string
isn't valid markup.  Adding more ways to make this sort of error
doesn't sound like a good thing.
Comment 13 Matthias Clasen 2004-01-16 09:33:31 UTC
I have started to look into this and have a libglade patch to support


the "attributes" property on GtkLabel. Attributes are deserialized 
from a Pango markup string, with the special feature that the 
attributes are taken to be global (ie apply to range 0-MAXINT) if the 
text in the markup string is just a single space,eg




<property 


  name="attributes">&lt;b&gt;&lt;i&gt; &lt;/i&gt;&lt;b&gt;</property>


<property 


  name="label" translatable="yes">I'm bold and italic</property>




My plan to support this in glade would be to let the ui unchanged, and 
when saving the "label" property, parse the markup, determine whether 
it is only global, and if yes, save the markup separately. When 
loading, the "attributes" and "label" properties will have to be 
merged again.




Open questions




- This approach only deals with labels. Is this enough, or are there 


other common offenders which would need to be fixed ? 


(message dialog come to mind)


- Do you think the outlined approach of handling this in glade is 


reasonable ?


- We probably need a project flag to enable this attribute splitting,


otherwise opening and saving a file in glade could inadvertedly cause


massive string changes.




Comment 14 Damon Chaplin 2004-01-16 12:13:21 UTC
I don't really like the format of your attributes property. It looks
like a bit too much of a hack.

I'd rather have fake properties like this:
  <property name="font_weight">bold</property>
  <property name="font_style">italic</property>
  <property name="font_size">large</property>


Also beware of markup like:
  "<b>this is bold</b> this is plain <b>this is also bold</b>"
If you don't parse the entire markup you may wrongly assume that the
entire string is bold.

The advantage of the fake properties above is that we don't need
to parse the markup and we don't need the flag to switch over to
the new format. Users will have to manually switch over the
strings, but I don't think there are too many.

Regarding updating the Glade code, I think you only need to worry
about the GtkLabel and GtkAccelLabel widgets for now.
Comment 15 Matthias Clasen 2004-01-16 21:37:22 UTC
Created attachment 23456 [details] [review]
attributes property for libglade
Comment 16 Matthias Clasen 2004-01-16 21:47:12 UTC
The plan for glade would have been to use pango_parse_markup, and
analyse the resulting PangoAttrList, so that it would have handled
your example correctly.

Regarding "bold", etc pseudoproperties, I believe there is a little
problem here: 

- the glade maintainer prefers pseudo-property "bold"
- the libglade maintainer says: if a "bold" property, it should be a 
  real GTK property
- GTK already has a general "attributes" property, so "bold" would 
  be a hard sell 

Would it be acceptable to use "weight"/"style"/"variant" etc
pseudoproperties in the glade ui, but serialize/deserialize them into
an "attributes" string when loading/saving a glade file ?
Comment 17 Damon Chaplin 2004-01-18 20:05:51 UTC
I think your patch is quite good. Markup is easier to parse than I
thought. But I still think that the format in the XML file is too
messy:

  <property name="attributes">&lt;b&gt;&lt;i&gt;
                              &lt;/i&gt;&lt;b&gt;</property>

That is not very readable when compared to:

  <property name="font_weight">bold</property>
  <property name="font_style">italic</property>


Your "attributes" property isn't a real GTK+ property either - it
isn't much better than the fake properties.

I'd prefer the fake properties but I don't really mind which way
we decide. If we wait for GTK+ to add the properties it may never
be fixed, so I think we should choose one.

James - do you have any preference?
Comment 18 Matthias Clasen 2004-01-19 09:41:30 UTC
> But I still think that the format in the XML file is too


> messy:




Well, compare to




<property name="label">&lt;b&gt;&lt;i&gt;


                       your text here


                       &lt;/i&gt;&lt;b&gt;</property>




emitted by current glade. I agree that it is a bit messy, but 
(occasionally ugly) serialization is just the price we have to pay for 
selecting a textual format instead of a binary one. An alternative 
would be




<property name="attributes"><![CDATA[<b><i> </i></b>]]></property>






> Your "attributes" property isn't a real GTK+ property either - it


> isn't much better than the fake properties.




I have to disagree with this. There is an "attributes" property on the 
label of type PangoAttrList. So this is as much a real attribute as, 
e.g. any integer or float property, for which you also have to 
deserialize from strings in libglade, since gtk+ won't do it. One 
could perhaps hide the deserialization a bit better, using GValue 
transformation...




I'd also like to mention that the "attributes" property is quite a bit 
more general and extensible than the pseudo property approach, since 
it supports everything that can be expressed in Pango markup and will 
automatically support future extensions to it.
Comment 19 Damon Chaplin 2004-01-19 12:05:07 UTC
I've just spotted another problem - I'm not sure "attributes" can be
used together with an underlined access key. Can you check that?
I think we must support access keys.

In gtk_label_recalculate() it seems that if an access key is used
attributes aren't used.
See also gtk_label_set_pattern_internal() where it overrides any
existing attributes.

So it doesn't look like we can use the "attributes" property, unless
I'm mistaken.
Comment 20 Matthias Clasen 2004-01-19 12:25:42 UTC
Yes, you are right. It is 


either use (markup and/or underline) 


or use (attributes and/or mnemonic_keyval)




So what we would need here would be a new attribute 


"global_attributes" which would be merged with the attributes


from "attributes" or markup.
Comment 21 Kjartan Maraas 2004-05-15 12:01:04 UTC
Is this going to be fixed in glade2 or is it on the TODO for glade3? Just curious.
Comment 22 Damon Chaplin 2004-05-16 16:12:56 UTC
I doubt it will be fixed in glade2, unless someone else does it.
I'm not sure the glade3 people know about the issue. But I've been doing a bit
of work on glade3 and might even get around to this myself.
Comment 23 Christian Rose 2004-05-16 22:05:28 UTC
Thanks a lot Damon!
Comment 24 Stephen Kennedy 2005-05-05 20:11:36 UTC
I had a go at fixing this in a different way by changing only libtool. It relies
on the fact that most markup is of the form "<tag>text</tag>".

When the "strip-markup" is enabled, intltool adds an extra line to every
tranlatable string describing the surrounding tags if any i.e. 

#: foo.h:60
#FORMAT ''
#FORMAT '<b>'
#FORMAT '<span size="larger"><i>'
msgid "A string"

The strings are then unmerged before being passed to msgfmt.

This has the advantage that:
 It requires only changes to intltool.
 It won't break any existing translations (it has to be explicitly enabled)
Comment 25 James Henstridge 2005-05-06 06:14:39 UTC
The above looks like it will still allow translators to shoot themselves in the
foot if they put special characters like '&', '<' and '>' in the translation.

Although if you are looking at modifying the i18n tools here, you might want to
look at whether it is possible to integrate with gettext's format string
vulnerability checker (invoked with msgfmt --check-format).

Currently it is designed to make sure that translated format strings match the
untranslated format string.  It probably wouldn't be too difficult to make it
check that a translated string contains valid pango markup.  Something that
would catch things like this:
  #, pango-markup
  msgid "<b>Hello</b>"
  msgstr "<bold>Hi&"
Comment 26 Damon Chaplin 2005-09-12 13:36:09 UTC
Dropping priority to normal. This probably won't be fixed in glade-2.

It only affects a small number of strings, so I'm not sure it is worth the
effort anyway.
Comment 27 Tristan Van Berkom 2007-03-13 01:02:23 UTC
moving this bug to the glade3 product, since I saw it mentioned
on gtk-devel-list.
Comment 28 Tristan Van Berkom 2008-04-11 06:44:11 UTC
Wrote a patch for GtkBuilder support of pango attribute parsing.

Marking this bug to depend on my patch on bug 527486
Comment 29 Gil Forcada 2008-08-04 19:47:11 UTC
Ping!

Maybe we can have a fix before GNOME 2.24?

Translators will be really glad (pun intended :)
Comment 30 Tristan Van Berkom 2008-08-05 03:30:49 UTC
Well, the attributes parsing is in gtk+, I think 2.24 will ship a new
gtk+ ... on the glade side of things - were probably not making 3.6
for gnome 2.24 - we have some tidying up to do - maybe we'll be there
minus builder support for menus.

I have an editor with a treeview written for the pango attributes, 
not in svn yet, no load/save support yet...

note I made a little announcement about us probably not making 2.24:
http://mail.gnome.org/archives/desktop-devel-list/2008-July/msg00095.html
Comment 31 Tristan Van Berkom 2009-02-07 15:36:22 UTC
err why didnt I close this ? anyway, we will have attribute parsing
and editors for GNOME 2.26, its in Glade/GTK+ trunk.

Closing bug.

Comment 32 Jendrik Seipp 2009-10-02 15:47:01 UTC
Hello,
nice to hear that the bug is fixed, but I don't see the fix ...
When I have a string in glade like 

<b>Bold Text</b>

intltool-extract still writes something like

char *s = N_("<b>Bold Text</b>");

into the .h file. In the .pot file from xgettext a translator has to translate the string:

#: ui.glade.h:12
msgid "<b>Bold Text</b>"
msgstr ""

I'd rather translate 

#: ui.glade.h:12
msgid "Bold Text"
msgstr ""

and let glade/gettext/gtkbuilder figure out the "boldness" themselves. 

What am I doing wrong?
Comment 33 Tristan Van Berkom 2009-10-02 16:39:58 UTC
Hi.

The fix is that now you dont need to use markup in 
Glade to set text attributes, now the GtkLabel
will parse <attributes> in the builder format
(and Glade provides a dialog to set the attributes
on the label).