GNOME Bugzilla – Bug 524077
ASCII characters > ord(127) are not extracted correct from jpeg EXIF ImageDescription and UserComment
Last modified: 2009-02-05 19:34:43 UTC
beagle-extract-content does not extract ASCII characters > ord(127) correct from jpeg files. E.g. danish letters (æøåÆØÅ) are removed from output. According: http://www.exif.org/Exif2-1.PDF IFD0.ImageDescription is always ASCII EXIF.UserComment charset can be ASCII/JIS/Unicode/Undefined I assume ASCII codepage should be determined based on ENV settings. EXIF.UserComment contains information about charset used.
Can you attach a sample image, expected and actual output that I can test against ? Thanks.
Created attachment 108010 [details] Test image with danish letters (æøåÆØÅ) in description and user comment
Created attachment 108011 [details] Description of actual (and wanted) output from beagle-extract-contents of test.jpg
Created attachment 108070 [details] [review] Use correct encoding for ImageDescription and UserComment and don't break other tags Can you test if this patch works correctly ?
This patch did not work on my installation. I can se the code uses: System.Text.Encoding.Default But where does mono get this setting from To quote my self >>I assume ASCII codepage should be determined based on ENV settings Maybe it is not this easy. My env says: LANG=en_DK.UTF-8 But does fedora have a setting for ASCII codepage, when ASCII no longer is used as default?
No. In fact you can use any encoding for any filename, its contents or any allowed metadata. There are only a few specific ways to know what encoding was used: - system default encoding (en_DK.UTF-8) in your case - where encoding is specified in the metadata spec or in the metadata itself But there will always be errors e.g. you received a file from someone with data in a different encoding, then it is not possible to find out what encoding it is. For metadata, some metadata could be in a different encoding than the others, so there is really really no way to deal with them all unless you have everything in utf8 or in your system encoding. You can try $ LANG=iso-88591 beagle-extract-content /... to see if that makes any difference.
Below make the patch work for me: export LANG=en_DK.ISO-88591 PS: Windows XP has a system setting "how should I handle non-unicode programs" where it is posible to assign a ASCII codepage. This works in 99.9 % of the cases if you organisation is domestic. This allow a smooth transition from ASCII to unicode - with no need for converting (tampering) old data files to unicode. A similar setting would be nice. (The EXIF.UserComment in the test jpeg file is marked as ASCII - but it is not poisible to assign codepage.)
> PS: Windows XP has a system setting "how should I handle non-unicode programs" > where it is posible to assign a ASCII codepage. This works in 99.9 % of the > cases if you organisation is domestic. This allow a smooth transition from > ASCII to unicode - with no need for converting (tampering) old data files to > unicode. > > A similar setting would be nice. I don't know why Linux does not have such things. Maybe it was historically not needed. We can try to add an environment variable BEAGLE_LANG_ASCII_CODEPAGE which can be used to specify default codepage for ASCII (ANSI, if unspecified). But such things will always break something else. E.g. all the apps out there will not be able to show the right information even though we extract it correctly. And then there will always be files with a different encoding that will be completely misread if the default encoding is used. I want to post this question on the mailing list and see what other suggestion people have. Hope you don't mind.
I forgot to update this bug. The last several releases (since 0.3.6 I believe) have updated f-spot image importers which handle non-utf8 encodings better than the previous approach. Unspecified encodings are still set to the system default encoding but usercomments and image-descriptions with different encoding are now correctly handled. That should fix your original problem. Can you check and report ?
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for. Thanks!