GNOME Bugzilla – Bug 103622
Not able to open non UTF-8 encoded text
Last modified: 2004-12-22 21:47:04 UTC
Nautilus cannot open any text file with locale's encoded text, contents, but brings up an error dialog, saying "No viewers are available for \"%s\". For example, in en_US.ISO8859-1 locale, it cannot open a text containing latin-1 characters. But, opening text with UTF-8 encoded contents works. Is this an expected feature?
Unless i misremember completely its supposed to try first utf8, then the locales default encoding.
It does not use text-viewer - since the mime type is is set to "application/octet-stream" for files of the current locale's text contents. But, my version of nautilus is a bit old. Is it supposed to work on the latest HEAD?
looks like this is gnome-vfs. Renaming the same file to foo.txt, it is treaed with "text/plain" mime type.
gnome-vfs allows only UTF-8 encoded text files as text/plain type. but nautilus's text-viewer can handle UTF-8, the current locale's encoding, and "iso-8859-1" contents. prepared a patch as it allows the same encodings as the text viewer. Please review and approve commit to the HEAD. * libgnomevfs/gnome-vfs-mime-magic.c (gnome_vfs_sniff_buffer_looks_like_text): allow the locale's default encoding and "ISO-8859-1" as text/plain (#103622)
Created attachment 13632 [details] [review] allow the same three encodings as nautilus text viewer does for text/plain
This is not really acceptable. All files are valid iso8859-1. This means all files will be considered text-files...
Yeah... then, add a check here if only the locale's encoding is not 8859-*? Or, should change nautilus to try a text viewer when a mime type is application/octet-stream and no associated viewer is installed?
I don't come across how to fix this. Any idea? Is it better to address this in nautilus-side anyhow?
this happens on the RH80 with latest updates.
Maybe we need to make _gnome_vfs_sniff_buffer_looks_like_text() use isprint() if the utf8 check fails. That should work pretty well.
Thanks, yes it seems to work pretty well. I'll attach a patch, testing isprint() and \n and \0 disappearance in addition. \n is to allow and \0 is not to allow. Please review and commit if it is okay.
Created attachment 13931 [details] [review] do isprint() test if utf8 check fails.
I see one issue with the patch, but I'm not sure of the right way to solve it. The isprint() functions are locale-based, and the locale may be multi-byte, so the p++ part isn't always right, is it? Won't it break for e.g. japanese two-byte chars?
Yeah...stupid of me... Then, how about using mblen? The test code will be like: #include <stdlib.h> ... int mb_len = 0; char *p; for (p = buffer; p < &buffer[buffer_len]; p += mb_len} { mb_len = mblen (p, MB_CUR_MAX); if (mb_len == 0 || mb_len == -1) return FALSE; } return TRUE;
Another way will be to use mbstowcs(): if (mbstowcs(NULL, buffer, buffer_len) == (size_t)-1) return FALSE; We may also use iswprint() also if the above is not enough.
Created attachment 13976 [details] [review] 3rd patch - use mbstowcs () != -1 check locale's text
Created attachment 13980 [details] [review] 4th patch - test with iswprint and iswspace
Created attachment 13998 [details] [review] 5th patch - add strlen() check to the 4th patch before mbstowcs()
Created attachment 14003 [details] [review] 6th patch - need null terminate before the 1st mbstowcs() call
Created attachment 14006 [details] [review] 7th and final(with hope) patch - don't change sniff_buffer->buffer
I'm sorry for many spams.. wishing this be a final one. Please review the 7th patch.
Created attachment 14405 [details] [review] Different patch
Can you try this patch instead?
Yes, it seems to work well.
Commiting it then.