After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 670953 - Tesseract 3.02 issues
Tesseract 3.02 issues
Status: RESOLVED FIXED
Product: ocrfeeder
Classification: Other
Component: general
git master
Other Linux
: Normal major
: ---
Assigned To: ocrfeeder-maint
ocrfeeder-maint
master
Depends on:
Blocks:
 
 
Reported: 2012-02-28 10:08 UTC by gregg128
Modified: 2012-04-11 16:42 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
screenshot (123.31 KB, image/png)
2012-02-28 10:08 UTC, gregg128
Details

Description gregg128 2012-02-28 10:08:14 UTC
Created attachment 208559 [details]
screenshot

Every page recognized with tesseract 3.02 (the latest version) starts with a message "Tesseract Open Source OCR Engine v3.02 with Leptonica".
See screenshot attached.
Comment 1 gregg128 2012-02-28 10:09:34 UTC
Here is what I have in "engine arguments" field:
$IMAGE $FILE -l rus; cat $FILE.txt; rm $FILE $FILE.txt
Comment 2 Joaquim Rocha 2012-03-05 21:48:20 UTC
Hi,

Thanks for reporting this.

The problem was that Tesseract was changed to print that version message in the standard output which then got into OCRFeeder's text fields when performing the recognition.

This is fixed in the upstream version and will be available on the next release.
To fix it yourself, if you haven't done it yet, just change the engine's arguments to:
$IMAGE $FILE -l rus > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

After that, no error messages will appear.

If you wait for the update, after having the new version, be sure to delete the Tesseract configuration and perform "detect" to get the fixed arguments (don't forget to add the Russian language like you did).

Best regards,
Comment 3 Alberto Garcia 2012-03-06 09:15:27 UTC
(In reply to comment #2)
> If you wait for the update, after having the new version, be sure to
> delete the Tesseract configuration and perform "detect" to get the
> fixed arguments

Hey, but if any user upgrades OCRFeeder in their computer, are they
supposed to delete the Tesseract configuration themselves? How are
they supposed to know about that?
Comment 4 Joaquim Rocha 2012-03-06 09:29:44 UTC
Hi,

I don't know what's the best way to approach this. I don't feel like modifying the engine's arguments automatically is a good idea because they may have been modified like this user had.

One solution that comes to my mind but I hate it is to show up a list of important messages on start up but this seems noisy and so 1998.

Always overwriting the engines' configuration upon upgrading seems like a bad idea for obvious reasons.

Doing some sort of regex to match the parts that users might have added themselves and to apply it to our default arguments seems like a possible solution but I don't know how robust it can be....


Do you have a better suggestion?
Comment 5 Alberto Garcia 2012-03-06 10:21:03 UTC
(In reply to comment #4)
> Always overwriting the engines' configuration upon upgrading seems
> like a bad idea for obvious reasons.

In most cases users will be using the default Tesseract configuration.

You can detect that and upgrade it automatically, possibly showing a
message indicating what has just happened, although I don't even think
it's necessary in this case.

If you detect that users changed the Tesseract settings you might just
warn them.
Comment 6 gregg128 2012-03-08 23:35:24 UTC
Thank you very much for fixing the issue.

I second A.Garcia:

Detect, if the user's Tesseract configuration differs from the standard one
> if yes, show warning
> if no, silently change user's configuration

This should work just good, if it looks 1998-ish or not :)
Comment 7 Joaquim Rocha 2012-03-10 19:10:31 UTC
Hi guys,

The issue with what Alberto proposed was that I had no way to check whether the configuration had changed. Another issue was how to check if the currently saved configuration is an old one because there might be more than one old configuration.

I've finally did it right.

Now, when the configuration is changed and the user had a default one stored, it will alert the user that the engine needs to be updated and can be done automatically. If the stored configuration is an old one, it will alert the user and say it must be changed manually and give the option of opening the OCR engines manager dialog.
These alerts are shown only once and for all the engines that need to be updated.

http://git.gnome.org/browse/ocrfeeder/commit/

Hope you like it.
Comment 8 Joaquim Rocha 2012-03-10 19:11:37 UTC
The URL should be this one: http://git.gnome.org/browse/ocrfeeder/commit/?id=dda43c5f75885b7f78617e3e4241e6f922a97538
Comment 9 Teppo Turtiainen 2012-03-18 19:16:30 UTC
This dialog

http://www.flickr.com/photos/joaquimrocha/6847512456/

is really bad. There is no way for *anyone* to make an informed decision about that. A non-technical user won't to know what you are talking about and a technical user isn't being presented with enough information. Please just do that automatically without showing a dialog.
Comment 10 Joaquim Rocha 2012-03-18 19:31:24 UTC
I'll give it a thought.
Comment 11 Alberto Garcia 2012-03-21 14:35:06 UTC
There's one additional problem with this fix: if a user runs OCRFeeder
for the first time (that is, if the configuration is empty), it acts
as if the arguments had been manually changed by the user:

$ rm -r ~/.ocrfeeder
$ ocrfeeder

               -----------------------------------
  The following engines' arguments might need to be updated but it
  appears you have changed their default configuration so they need to
  be updated manually:

    Tesseract
               -----------------------------------
Comment 12 Bernhard Reiter 2012-03-30 10:47:08 UTC
Joaquim, is there any chance that you fix the issues stated in comments #9 and #11 soonishly? I'd love to get a version of OCRFeeder into Ubuntu 12.04 (Precise) that works with Tesseract 3.0 nicely, and as I'd like to sync it from Debian, I need to wait until Alberto is happy with it ;-) ( see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661499 )
This will of course be getting harder and harder with Precise's release date approaching...
Comment 13 Joaquim Rocha 2012-03-30 11:23:28 UTC
I Bernhard,

I will *probably* tackle those (the bugs, not the reporters :) ) this weekend.


Cheers,
Comment 14 Alberto Garcia 2012-03-30 13:32:52 UTC
I have everything ready to upload the package, I'm just waiting for the patch :)
Comment 15 Bernhard Reiter 2012-04-11 00:43:12 UTC
*ping*
Comment 16 Joaquim Rocha 2012-04-11 07:30:52 UTC
I'll try to have it released today with this fix included.
Comment 17 Joaquim Rocha 2012-04-11 16:42:37 UTC
I've just released OCRFeeder 0.7.9 with this fix.

Hope it's better this time.