GNOME Bugzilla – Bug 46650
add feature which outputs multiple localized files instead of one merged file
Last modified: 2004-12-22 21:47:04 UTC
Add support so that one starts with one XML data file, and then produces one XML data file per locale. Example XML file from scrollkeeper: <?xml version="1.0"?> <ScrollKeeperContentsList> <sect> <title>Applications</title> <sect> <title>Amusement</title> </sect> <sect> <title>Games</title> </sect> </sect> </ScrollKeeperContentsList> Then we produce similar files in each locale. ------- Additional Comments From darin@bentspoon.com 2001-02-15 09:28:47 ---- No idea whether this is required or where it's useful. Dan, you need to put a bit more information about motivation in the bug reports, rather than just "do this please". ------- Additional Comments From dan@eazel.com 2001-02-15 17:21:15 ---- We can treat this as a wish item with no particular deadline. Maciej and Kenneth said it sounded like a reasonable thing to do which wouldn't be hard and that they would probably do in the not-to-distant future. However, it isn't a priority for Nautilus 1.0 or even GNOME 1.0. It just makes the lives of translators of the document categories somewhat easier. I'm changing the target milestone to 1.2 so it doesn't look like a 1.0 priority. ------- Bug moved to this database by unknown@bugzilla.gnome.org 2001-09-09 21:00 -------
This seems to be very important for Sun Microsystems. But not only for XML files, but for all our supported fileformats <quote> I'm asking because Sun's packaging policy for translated content dictates that language content be split out into packages on a per language basis and separate from the applications that the content is for. So I need to create packages containing, for example, French UI Messages, Chinese UI messages. Being able to do what I am asking about above would make things a hell of a lot easier for me. </quote>
This requires a lot of thinking. Is this what we want to do? and how should we do it so we avoid a lot of the pain it will be moving to a different GNOME i18n setup
This requirement also includes .desktop files and any other file that we use with all translations inside?
>This requirement also includes .desktop files and any other file that >we use with all translations inside? It does yes - .desktop, .xml, .server, .keys, .directory files all contain multiple language content which makes packaging language content separately pretty tricky. Are there any more I don't know about? Are you currently in a position to bundle community language content in separate RPMs on a per language basis for the community Gnome releases? Sun have the policy of separating non-English content from the base applications for a number of reasons. One was that the language requirements of the Solaris OS in particular were different depending on the OS version. Another was that it allows greater flexibility in what the user chose to install if the linguistic content is separated. How does this currently work in the community builds? Anyway, the files with shared content are a bit perplexing as I cannot recall ever having encountered i18n message files with multiple language content and building Gnome 11 times with LINGUAS set to each language and then packaging isn't really an option! Regards, Damien
Doesnt seem so important anymore. Set to Low Priority
This is actually very important to GOK. There are lots of other use cases where it's important - could you please upgrade the priority? In particular, the current approach makes parser logic much more complex from the point of view of clients, it has bad performance implications since we're potentially talking about >100 languages/localizations, and it reduces human-readable or user-editable XML files to unmanageability. GOK also has a number of XML files which are inherently locale-specific, so per-locale/lang directories are already required. It makes sense to output split XML to them as well. THis is a high priority/blocker for GOK, which is now part of the gnome-2.4 desktop. IN order to localize GOK properly, this bug needs attention.
I have to agree with Bill - it is still important. Admittedly though, to get complete language separation of application from language content is a big job given the number of different file types involved. I'd settle for this happening for XML as a good starting point and we are only talking about .xml files being affected for the GOK. I discussed the whole multilanguage content issue with a few people at GUADEC the other week and Christian Rose and Kjarten to name just 2 could see the logic in wanting the separation. At that point we were exploring the possibility of separating out the linguistic content building in specific Makefile targets so that one would be able to build and merge all linguistic content without having to build the whole module like you do at the moment (at least as far as I am aware). (I know you can do it with the intltools standalone but need to know what is actually in the module first). This would be useful for test and it would even be useful for bundling single languages. However, because the single language .desktop files, .xml files etc. would overwrite each other in separate rpm or SUNW package installs it wouldn't _really_ get around the problem of being able to give users multiple language packages and letting them install MORE than one. Another option that was discussed for the XML (and sure Dan will give a million reasons why this is not a good idea) was gettext fallback for XML localised content. Well it would make for more efficient parsing of the XML! Anyhow, I am all for allowing the creation of separate XML files without much inclination on how to go about it! It should definitely be optional.
Well, for GOK what we'd want it for *.xml.in => <locale>/*.xml i.e. per-locale data diretories for these XML files. Dunno how many projects this would be useful for but certainly for us it makes sense as we'd point to the locale-appropriate directory for our data files. Question remains whether we'd implement the fallback behavior on a per-file basis in the client (i.e. if <locale>/foo not found, use C/foo) ; probably the answer would be "yes". The assumption is that if a string is marked up for translation, the client would pull strings from the <locale>/foo.xml files.
Perhaps the best option could be put them at /usr/share/locale/ like the .mo files or perhaps just install the english version without any other translation and get the translation directly from the .mo file (like .glade strings), we have those translations already inside the .po/.mo file...
Marking AP2 for now to reflect accessibility team's assessment, don't think it's a GOK release stopper just yet.
Apologies for spam... marking as GNOMEVER2.3 so it appears on the official GNOME bug list :)
Raising priority, since gok seems to depend on this feature very badly.
Any progress here? It's blocking GOK localization, and GOK is part of the GNOME 2.4 desktop. Too late for 2.4.0 I'm afraid, but we could get GOK localized for 2.4.1 if we get a fix for this bug in the next couple of weeks.
I don't think there will be much progress on this front unless someone (maybe Sun *hint**hint*) starts contributing patches. intltool is in my experience very much only emergency maintained right now, if at all maintained, so requests for new exciting development, however necessary those areas may really be, seem extremely unrealistic at this point unless they are also followed by patches. Just my observation.
Yes I have been and still am very busy - which is very unfortunate. I should be able to look at this in the weekend - is that too late? Kenneth
Noted. I had a look yesterday, and passed it top Brian Cameron who is more perl-savvy than I. He's looking at this now.
correction, brian is looking at the other intltool bug (regarding attribute localization), bug 116526. Kenneth, nice to see you're on the air. Michael Twomey at sun suggested that the scrollkeeper post-processing stuff for splitting files could be adapted to do this during install, instead of hacking intltool-merge to do it. Can you evaluate the two basic approaches this weekend to see which seems most feasible and expedient? IMO the window for fixing this for 2.4.1 is 2-3 weeks or so, provided release-team accepts adding this capability in 2.4.1. Thanks for looking at this Kenneth. Also, maybe you can review what Brian comes up with for 116526 or give a hint; I personally am not a perl regex expert, not sure how familiar Brian is with perl regex either. - Bill
Yes I will evaluate these two basic approaches this weekend. I can review the patch as well. Kenneth
I added a feature to output multiple files (--multiple-output). I don't know if this works exactly like you want it to, but it should be pretty easy to modify it to serve your purposes. Please test it and add a test to intltool. Just modify the last test. Cheers, Kenneth
Note that I wrote a patch to fix bug #116526 that affected this logic. My new logic in that patch retains the --multiple-output functionality, though I had to rewrite it to work with an XML parser.
Thanks for doing the work!
Brian, Kenneth: I have some issues/questions regarding the way this is currently implemented. #1: the names seem to be <filename>.<extension>-<locale>, which means that the file extension is locale-dependent. That seems broken; the expected output (I thought) was one of the following: <filename>-<locale>.<extension> OR <locale>/<filename>.<extension> [i.e. the file is placed in a locale-dependant subdirectory] I prefer the latter solution, i.e. creating subdirectories for the existing locales and creating files there. It may be a little harder to create a Makefile.am entry for that rule however. The easiest implementation would be a third possibility: <filename>.<locale>.<extension> which would at least keep the same extension for purposes of identifying file types, _and_ also keep the same filename for purposes of identifying the file itself. Example: main.kbd.in gets translated to main.kbd-fr Desired output: either fr/main.kbd, main-fr.kbd, or main.fr.kbd #2: The output files appear to contain translations for locales other than the target (not just the C locale stuff). That seems really broken, i.e. main.kbd-fr should not contain any elements in locale pt_BR etc (only C locale elements, in the event that they have no translations available). Brian, could you look into this? I believe that the output files should contain either: * only locale-specific elements, whose content falls back to C locale if no translation is available; OR * locale-specific elements _and_ C locale elements for untranslated content [better]. this would mean that if a parent element had no translation in the target locale, the parent C locale element would be used (unless there was data matching the 'lang' but not the minor-variant, for instance if there was pt but not pt_BR and the target was pt_BR). At most then there would be three of each element (C, LANG, and target locale) IN any case I don't see why totally unrelated locale elements appear in a locale-specific file, i.e. why Armenian (am) strings are appearing in zh_TW (traditional Chinese) XML files.
In regards to #1, the way the filename is built is very trivial to change. I think, though, that an intltool maintainer needs to comment about what the ideal name should be. With my patch, I left the naming convention for output files the same as before. In regards to #2, the fix for this bug is in the attached patch. Can I commit?
Created attachment 21341 [details] [review] Fixes problem with multiple-file output
#1: fr/main.kbd is the prefered dir + filename; but the others are also fine. #2: Feel free to commit the patch. Kenneth
I like fr/main.kbd too. Brian, can you have a go at what an appropriate Makefile.am entry would look like? I don't know offhand how to patch gok/Makefile.am so that it creates gok/<locales>/main.kbd
patch that fixes issue #2 above has been committed. Regarding issue #1, wouldn't it make more sense for the intltool-merge script to create the various subdirectories and place the various output files in the appropriate subdirectory. This seems easier than for the Makefile.am script to create the subdirectories before calling intltool-merge The only think that would change in Makefile.am would be the various install rules. They would obviously need to be updated to install the various <lang>/file.<extension> files to their appropriate location.
Hi Brian: I agree, and attach (FWIW) a patch that creates the directories and writes the files to it, i.e. creates 'fr' and writes 'fr/main.kbd' etc. It also writes ./main.kbd for 'C' locale. However that last part isn't implemented correctly since the patch causes a number of 'uninitialized variable' warnings from perl. Since I know virtually nothing about perl, perhaps you can fix that last part. I think that even if --multiple-output is used, we should write a ./<foo> file so that code which searches for <locale>/foo can fall back to C locale. The other option would be to write C/foo instead, i.e. include 'C' in the list of locales. thanks!
Created attachment 21343 [details] [review] patch to write files in locale-specific directories, plus C locale in '.'
Brian, if you can fix my patch so it doesn't throw those errors I'd be grateful. Otherwise it seems to do what we want... seems we're nearly to declare this bug fixed? :-) Yes, the make install rules for the *.kbd files would have to change; I haven't figured out how yet.
I've fixed the gok Makefile.am rules, so we're ready to go when the issue with my above patch is fixed and committed. Ought to be a very simple fix but you (brian) will have a better idea I think than I.
Created attachment 21346 [details] [review] improved patch against HEAD which outputs files to <locale> directories without error messages
Bill, it sounds like you fixed the problem with the Perl. Do you still need me to do anything, or is this now fixed?
Brian: if you think my patch is OK I will commit (based on Kenneth's consent which I think he's already given in principle).
Your patch looks good. A couple of nit-picks. I think you are needlessly setting the $lang variable in the "else" case. Also, it looks like you needlessly changed the indentation of certain lines. Also, you might want to check the return code from mkdir and display a nice error message upon failure. This way we can make sure that the directory was successfully created before trying to create the output file.
I thought I might need to set $lang in order to suppress the error messages. Probably sufficient to do the "| 0" thing instead. thanks formatting: well, there seemed to be weird tabs in the source. I think this is an editor weirdness since AFAIK there shouldn't be tabs in the code, only spaces.
ok, it's my editor that was acting up. I've added a check for existance of the $lang subdir and an error message if it can't be created. Also removed the needless assignment. thanks! committing to cvs.
There were slight incompatibilities introduced with changes to XML merging code in intltool-merge (I broke it, so I know :). I'll fix them right away, but I want to do a slight improvement (IMHO) at the same time, and I want your opinion on that. So, now intltool (CVS HEAD) produces .kbd files with something like: <tag>Original string</tag> <tag xml:lang="lang">Translated string</tag> where it previously produced: <tag>Translated string</tag> <tag xml:lang="lang">Translated string</tag> I have a simple patch that makes it output only: <tag>Translated string</tag> Please test current intltool CVS, and along with the following patch (will be attached). I'm reopening the bug until this is resolved again. (As a sidenote, produced .kbd files should be much easier to edit this way, and you seem to have aimed for readability, so you win more this way)
Created attachment 28737 [details] [review] Output only translations when doing --multiple-output If GOK guys agree with this change, I guess it's fine to commit this even though Kenneth will be unreachable for a couple of days.
Danilo: your patch seems to be against intltool-merge, not intltool-merge.in.in. Did you intend that?
Bill, sorry about that, it was easier for me to do it that way (I just worked on a copy created in gok/intltool-update after ./autogen.sh). I hoped it won't cause a problem, but it obviously does: you may apply it against intltool-merge.in.in, since it differs from intltool-merge only in a couple of @MACROS@ which are replaced by sed (and not touched by the patch, of course). Again, sorry about that.
Danilo: I applied your patch and confirmed that the code changes were in my intltool-merge. I installed intltool-merge, and reconfigured/built/installed GOK; however your patch was NOT working as you described it (I am still seeing an untagged, untranslated string, plus a tagged, translated string, in my intltool-merge output).
Danilo: I guess for some reason my intltool-merge in gok wasn't getting replaced when I re-gen'ed. I applied your patch directly (as you suggest) and it did work. However I have a bit of a concern. If we get the strings from tagged XML, then we can be sure of the locale that the strings are actually in; this is useful if, for instance, we need to speak the label when activated or focussed (since the locale/lang will determine which text-to-speech voice we can use). Perhaps it would be better to include the tags (i.e. put tags in the translated elements, when available). So each element would only occur once, either with tags (indicating that it was merged from the po files), or without tags (meaning that the untranslated value is being used).
On the problem you experienced: you need to remove intltool-* from the directory before doing ./autogen.sh. I'm not sure I understand you correctly. Are you thinking of "xml:lang" when you're talking about "tags"? That's an option as well. If that's what you want, just add the following line to the already patched intltool-merge: if ($MULTIPLE_OUTPUT && $translation) { + print $fh " xml:lang=\"", $language, "\""; print $fh ">", $translation, "</$nodename>"; If I got you wrong, please elaborate what you're thinking of.
Hi Danilo: Yes, adding the xml:lang attribute to the translated element is what we had in mind. I think that the one-line change you suggest above would be better (i.e. preserve lang info in the strings), but I have not been able to confirm that it works as I expect yet. Thanks!
Danilo: the patch to intltool-merge, with the oneline addition you propose in comment #44, works well for GOK. I'd recommend changing your patch to modify intltool-merge.in.in (of course), but otherwise the above code changes seem like an improvement. Thanks!
Ok, I'll commit this patch (since GOK seems to be the only current user of "-m" feature of intltool, according to grep of my Gnome CVS checkout), and adjust intltool testcases (bug 138512 -- why I searched for this bug in the first place). I think Kenneth won't mind my committing this right away. So, I'm closing this bug again.
OK Danilo: just to confirm, you committed the version with the line: + print $fh " xml:lang=\"", $language, "\""; and also patched intltool-merge.in.in instead of intltool-merge, right?
Confirmed Bill, I comitted that version. There's also no "intltool-merge" in the repository so I obviously had to patch intltool-merge.in.in -- no need to worry, I've been doing a lot of intltool hacking lately ;)
Thanks Danilo; I was mostly making sure _I_ understood what was happenning here :-)