GNOME Bugzilla – Bug 670953
Tesseract 3.02 issues
Last modified: 2012-04-11 16:42:37 UTC
Created attachment 208559 [details] screenshot Every page recognized with tesseract 3.02 (the latest version) starts with a message "Tesseract Open Source OCR Engine v3.02 with Leptonica". See screenshot attached.
Here is what I have in "engine arguments" field: $IMAGE $FILE -l rus; cat $FILE.txt; rm $FILE $FILE.txt
Hi, Thanks for reporting this. The problem was that Tesseract was changed to print that version message in the standard output which then got into OCRFeeder's text fields when performing the recognition. This is fixed in the upstream version and will be available on the next release. To fix it yourself, if you haven't done it yet, just change the engine's arguments to: $IMAGE $FILE -l rus > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt After that, no error messages will appear. If you wait for the update, after having the new version, be sure to delete the Tesseract configuration and perform "detect" to get the fixed arguments (don't forget to add the Russian language like you did). Best regards,
(In reply to comment #2) > If you wait for the update, after having the new version, be sure to > delete the Tesseract configuration and perform "detect" to get the > fixed arguments Hey, but if any user upgrades OCRFeeder in their computer, are they supposed to delete the Tesseract configuration themselves? How are they supposed to know about that?
Hi, I don't know what's the best way to approach this. I don't feel like modifying the engine's arguments automatically is a good idea because they may have been modified like this user had. One solution that comes to my mind but I hate it is to show up a list of important messages on start up but this seems noisy and so 1998. Always overwriting the engines' configuration upon upgrading seems like a bad idea for obvious reasons. Doing some sort of regex to match the parts that users might have added themselves and to apply it to our default arguments seems like a possible solution but I don't know how robust it can be.... Do you have a better suggestion?
(In reply to comment #4) > Always overwriting the engines' configuration upon upgrading seems > like a bad idea for obvious reasons. In most cases users will be using the default Tesseract configuration. You can detect that and upgrade it automatically, possibly showing a message indicating what has just happened, although I don't even think it's necessary in this case. If you detect that users changed the Tesseract settings you might just warn them.
Thank you very much for fixing the issue. I second A.Garcia: Detect, if the user's Tesseract configuration differs from the standard one > if yes, show warning > if no, silently change user's configuration This should work just good, if it looks 1998-ish or not :)
Hi guys, The issue with what Alberto proposed was that I had no way to check whether the configuration had changed. Another issue was how to check if the currently saved configuration is an old one because there might be more than one old configuration. I've finally did it right. Now, when the configuration is changed and the user had a default one stored, it will alert the user that the engine needs to be updated and can be done automatically. If the stored configuration is an old one, it will alert the user and say it must be changed manually and give the option of opening the OCR engines manager dialog. These alerts are shown only once and for all the engines that need to be updated. http://git.gnome.org/browse/ocrfeeder/commit/ Hope you like it.
The URL should be this one: http://git.gnome.org/browse/ocrfeeder/commit/?id=dda43c5f75885b7f78617e3e4241e6f922a97538
This dialog http://www.flickr.com/photos/joaquimrocha/6847512456/ is really bad. There is no way for *anyone* to make an informed decision about that. A non-technical user won't to know what you are talking about and a technical user isn't being presented with enough information. Please just do that automatically without showing a dialog.
I'll give it a thought.
There's one additional problem with this fix: if a user runs OCRFeeder for the first time (that is, if the configuration is empty), it acts as if the arguments had been manually changed by the user: $ rm -r ~/.ocrfeeder $ ocrfeeder ----------------------------------- The following engines' arguments might need to be updated but it appears you have changed their default configuration so they need to be updated manually: Tesseract -----------------------------------
Joaquim, is there any chance that you fix the issues stated in comments #9 and #11 soonishly? I'd love to get a version of OCRFeeder into Ubuntu 12.04 (Precise) that works with Tesseract 3.0 nicely, and as I'd like to sync it from Debian, I need to wait until Alberto is happy with it ;-) ( see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661499 ) This will of course be getting harder and harder with Precise's release date approaching...
I Bernhard, I will *probably* tackle those (the bugs, not the reporters :) ) this weekend. Cheers,
I have everything ready to upload the package, I'm just waiting for the patch :)
*ping*
I'll try to have it released today with this fix included.
I've just released OCRFeeder 0.7.9 with this fix. Hope it's better this time.