After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 555627 - intltool-merge broken by Tomboy lv translation which has "\x01" in msgstr
intltool-merge broken by Tomboy lv translation which has "\x01" in msgstr
Status: RESOLVED NOTABUG
Product: intltool
Classification: Deprecated
Component: general
0.40.x
Other All
: Normal normal
: ---
Assigned To: intltool maintainers
intltool maintainers
Depends on:
Blocks:
 
 
Reported: 2008-10-09 00:05 UTC by Changwoo Ryu
Modified: 2009-04-11 09:20 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Changwoo Ryu 2008-10-09 00:05:08 UTC
Please describe the problem:
During normal Tomboy build, intltool-merge generates incomplete files.

I found Tomboy's po/lv.po file has "\x01" character in its msgstr. It breaks intltool-merge's cache (-c) feature.

I do not know whether the current Tomboy lv translation is correct or not. But generally it's nothing wrong to have control characters in msgstr so intltool should not be broken by this.


Steps to reproduce:
1. In tomboy source tree,
2. Run "intltool-merge -d -u -c po/.intltool-merge-cache po data/tomboy.desktop.in data/tomboy.desktop" twice.


Actual results:
It warns, "Odd number of elements in hash assignment at /usr/bin/intltool-merge line 425."  And the resulting data/tomboy.desktop file is incomplete.


Expected results:
It should merge all translations to the desktop file.


Does this happen every time?
Always.

Other information:
Comment 1 Rodney Dawes 2008-10-09 01:04:29 UTC
I just pulled tomboy TRUNK out of svn, and lv.po does not contain the string "\x01" anywhere in the file, and the last modification to the file was on 2007-04-01.

Are you sure there isn't some other problem perhaps? It also strikes me as odd that your "steps to reproduce" says that you have to run intltool-merge twice, not once.

Adding the \x01 does seem to cause other valid translations to be skipped though. The fi translation seems to be the only one up to date for the .desktop file, and it doesn't included for the _Comment tag when I place the \x01 in the lv.po. I don't know why that is though. The lv.po would be skipped anyway. It's probably the fact that intlool ALLOWS such characters to exist, rather than disallowing them, that is the problem here. What would "START HEADING" mean for a string in a translation, or a desktop file, anyway?
Comment 2 Changwoo Ryu 2008-10-09 07:37:27 UTC
Of course I mean the control character code ^A, not "\x01" literally. Here is the bad lv.po entries:

#: ../libtomboy/gedit-print.c:267
#, c-format
msgid "Rendering page %d of %d..."
msgstr "RenderDju %d no %d lapDm..."

#: ../libtomboy/gedit-print.c:269
#, c-format
msgid "Printing page %d of %d..."
msgstr "DrukDju %d no %d lapDm..."


Reproducing by running twice is because, it breaks the cache feature. intltool-merge uses ^A (\x01) as the entry separator while saving/loading the cache. Having ^A in a msgstr breaks it, and intltool-merge loads wrong cache in the second run. The below is the buggy piece of intltool-merge code:


sub create_cache
{
    print "Generating and caching the translation database\n" unless $QUIET_ARG;

    &create_translation_database;

    open CACHE, ">$cache_file" || die;
    print CACHE join "\x01", %translations;
    close CACHE;
}

sub load_cache 
{
    print "Found cached translation database\n" unless $QUIET_ARG;

    my $contents;
    open CACHE, "<$cache_file" || die;
    {
        local $/;
        $contents = <CACHE>;
    }
    close CACHE;
    %translations = split "\x01", $contents;
}


The current tomboy Debian package has incomplete .server file. (.desktop file is built in the first run, .server file is built from the wrong cache.) I thought it was a Debian specific problem until I found the lv.po entries and this intltool-merge code. I'm still not sure why Fedora (or even Ubuntu) has no such problem though. 
Comment 3 Changwoo Ryu 2008-10-09 09:27:43 UTC
OK. I read the Fedora's tomboy build log and I found why..  They are using -j4 make option for build. Three intltool-merge processes run in parallel so accidentally avoid using the wrong cache.

And Ubuntu package's .server file is also incomplete. But they seem to implement their own runtime translation mechanism for their translation only package.

<oaf_server ubuntu-gettext-domain="tomboy" iid="OAFIID:TomboyApplet_Factory" ...
Comment 4 Claude Paroz 2008-10-09 09:46:53 UTC
IMHO this is a bug in the lv.po. Such control chars shouldn't be in the msgstr in the first place. Maybe check with the Last translator...
Comment 5 Rodney Dawes 2008-10-09 14:41:30 UTC
I agree with Claude here. The gettext documentation says to avoid control characters like this. See http://www.gnu.org/software/automake/manual/gettext/Preparing-Strings.html for reference.
Comment 6 Changwoo Ryu 2008-10-09 15:06:53 UTC
Then how about exit with an error when there is such a control character, instead of generating wrong files silently?

Comment 7 André Klapper 2008-10-16 14:53:53 UTC
CC'ing Raivis who was te last Latvian translator - please take a look at this!
Comment 8 Raivis Dejus 2008-10-17 06:09:01 UTC
I removed control characters from Latvian lv.po. It should be ok now.
Comment 9 Danilo Segan 2009-04-11 01:41:04 UTC
This is definitely not a bug. GNU gettext uses such characters internally, yet silently produces bad PO files if you try to compile them (i.e. try with \0 or \2, which is, if I am not misremembering, used for msgctxt separation in MO files: you'll get broken MO files as well).
Comment 10 Changwoo Ryu 2009-04-11 09:20:19 UTC
MO file with \0 or \2 does not break other translated messages. But in this case, a single \1 character broke all languages' intltool-merge based translations silently.

It's not intltool's fault, but a simple check by intltool can prevent it.