After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 125593 - Counting words on the GTP translation status pages
Counting words on the GTP translation status pages
Status: RESOLVED FIXED
Product: damned-lies
Classification: Infrastructure
Component: general
unspecified
Other All
: Low enhancement
: ---
Assigned To: Gil Forcada
damned-lies Maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2003-10-27 12:59 UTC by Christian Rose
Modified: 2011-08-20 07:45 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Migration script to add the word fields on pofile table (17.60 KB, patch)
2011-08-19 16:29 UTC, Gil Forcada
accepted-commit_now Details | Review
Call pocount to get word statistics on po files (1.45 KB, patch)
2011-08-19 16:33 UTC, Gil Forcada
accepted-commit_now Details | Review
Register words related fields and ensure they are us (3.98 KB, patch)
2011-08-19 16:37 UTC, Gil Forcada
accepted-commit_now Details | Review

Description Christian Rose 2003-10-27 12:59:09 UTC
Most professional translators don't count "messages" but instead measure
the amount of work in words.

Since some teams have the ability to get some translation work funded and
use professional translators, having a feature on the status pages that
displayed the number of words in msgids for each module and the total
number of words would probably be useful.

This should probably not be the primary method of displaying stats, but
having the amount of words available when checking could be useful.
Comment 1 Carlos Perelló Marín 2003-10-27 13:09:22 UTC
That's a hard feature.

Will we count %s, %d, etc.. as words?

Should we count .pot words or also translation words?

etc...

I think that we will not have this feature until Bruno finish my
requests about gettext so we stop parsing directly the .po files.
Comment 2 Christian Rose 2003-10-27 14:00:52 UTC
Yes, perhaps this should wait until msgfmt can report the number of
translated / untranslated words. Will you add a request for that?
Comment 3 Carlos Perelló Marín 2003-10-27 14:02:09 UTC
Hmm I think we should wait for the urgent requests are finished and
then we start asking more fetures :-P
Comment 4 Danilo Segan 2006-07-31 20:28:12 UTC
Nice idea, though not in the immediate future. Will blend nicely with the idea for proper "supportedness" measure and better PO file checking.
Comment 5 Claude Paroz 2007-09-08 10:28:13 UTC
Here is some python code that could be a start for implementing this.

import string
import re

def countWords(line):
   for word in re.split("[" + string.whitespace + string.punctuation + "]+" ,line):
      word = string.lower(word)
      # check to make sure the string is considered a word
      if re.match( "^[" + string.lowercase + "]+$" , word):
          wordcount += 1
   return wordcount
Comment 6 Claude Paroz 2008-03-06 21:11:51 UTC
For future reference, added link to pocount (from translate-toolkit): http://translate.sourceforge.net/wiki/toolkit/pocount
Comment 7 Gil Forcada 2009-01-10 11:52:33 UTC
Now that everything is in Python should be more easy to use pocount, right?

Like:

def po_words(po_file):
  if os.access(po_file, os.R_OK)
    return call_to_pocount(po_file)
  else
    return 0

I've done something similar for my job (translations.openbravo.com) and I just used the commandline option:

from commands import getstatusoutput

command = 'pocount --csv %(file)s | tail -n1'
(status, output) = getstatusoutput(command)

stats = output.split(',')[1:] # discard the name

# now in stats[0] to stats[8] we have the {strings,words}_{translated,fuzzy,untranslated,total}


Hope it helps
Comment 8 Gil Forcada 2011-08-18 20:50:49 UTC
Ok, I'm going for it.

As I see the statistics are kept on three tables:

- pofile
- statistics
- statistics_archived

And are generated (at least) on stats/utils.py method po_file_stats.

Am I correct?

As translation toolkit is already used on stats/utils.py I will use the pocount method (AFAIK there would be needed to create a tempfile since pocount expects a file not a string).

btw the docs/DataModel.odg seems a bit outaded, I will file a bug and a "patch" to update it.
Comment 9 Claude Paroz 2011-08-19 07:30:33 UTC
Hi Gil, great to here from you.

The StatisticsArchived and InformationArchived are not used at all currently. Just ignore them. The statistics fields of the Statistics table are obsolete (as indicated in the code), so don't touch them. Only pofile is of interest to you.

My main worry currently is to find a good way to show these word counts on an already cluttered interface. I'm sure you will have good ideas :-)

I don't think you will have to create any temp file. All files are already available somewhere on the file system.

Good luck!
Comment 10 Gil Forcada 2011-08-19 12:13:44 UTC
Ok, you are talking about stats/models.py right? (I still have to figure out how everything works on a django based apps).

As for how to display them I'm not really concerned by now, first I want to have the statistics generated and later we will figure out how to show them, some random options:

- a toggle (or user preference) to display either words or strings
- use a hover on strings statistics to show the words
- remove strings and only use words?
- ask on gnome-i18n ML?

I'm playing with zenity module and I hope that by the end of this week I will have some more questions and doubts :)
Comment 11 Gil Forcada 2011-08-19 16:29:35 UTC
Created attachment 194239 [details] [review]
Migration script to add the word fields on pofile table

Here I go, first patch:

Adds translated_words, fuzzy_words and untranslated_words fields on pofile table.
Comment 12 Gil Forcada 2011-08-19 16:33:22 UTC
Created attachment 194240 [details] [review]
Call pocount to get word statistics on po files

Second patch:

Adds the call to pocount to get the word statistics and returns them with the other statistics.
Comment 13 Gil Forcada 2011-08-19 16:37:29 UTC
Created attachment 194241 [details] [review]
Register words related fields and ensure they are us

This patch does two things:

- registers the fields on PoFile class
- ensures that they are used while saving statistics
Comment 14 Claude Paroz 2011-08-19 20:27:12 UTC
Comment on attachment 194240 [details] [review]
Call pocount to get word statistics on po files

Might import those on the same line:
from translate.tools import pogrep, pocount
Comment 15 Claude Paroz 2011-08-19 20:28:13 UTC
Comment on attachment 194241 [details] [review]
Register words related fields and ensure they are us

Thanks for your work Gil
Comment 16 Gil Forcada 2011-08-19 22:01:24 UTC
(In reply to comment #14)
> (From update of attachment 194240 [details] [review])
> Might import those on the same line:
> from translate.tools import pogrep, pocount

Done!

Thanks for reviewing, I already sent the commits.

Now to think on how and when to show them :)
Comment 17 Daniel Mustieles 2011-08-20 07:45:23 UTC
1 vote for the hover option proposed by Gil. I think it is more useful for a translator to know how many strings are fuzzy/untranslated, rather than words.

I think that stats based in words, instead of strings, may be more confusing for translators; note that a string with five words in english can result in a (for example) string with 7 words in spanish, or a string with 1 word in german, so stats may not say the true at all (damned lies!!)