After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 103622 - Not able to open non UTF-8 encoded text
Not able to open non UTF-8 encoded text
Status: RESOLVED FIXED
Product: gnome-vfs
Classification: Deprecated
Component: MIME and file/program mapping
2.0.x
Other All
: Normal normal
: ---
Assigned To: gnome-vfs maintainers
Nautilus Maintainers
Depends on:
Blocks:
 
 
Reported: 2003-01-16 01:17 UTC by Hidetoshi Tajima
Modified: 2004-12-22 21:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
allow the same three encodings as nautilus text viewer does for text/plain (2.08 KB, patch)
2003-01-17 00:26 UTC, Hidetoshi Tajima
none Details | Review
do isprint() test if utf8 check fails. (1.15 KB, patch)
2003-01-29 21:38 UTC, Hidetoshi Tajima
none Details | Review
3rd patch - use mbstowcs () != -1 check locale's text (897 bytes, patch)
2003-01-31 01:16 UTC, Hidetoshi Tajima
none Details | Review
4th patch - test with iswprint and iswspace (1.35 KB, patch)
2003-01-31 03:24 UTC, Hidetoshi Tajima
none Details | Review
5th patch - add strlen() check to the 4th patch before mbstowcs() (1.41 KB, patch)
2003-01-31 17:57 UTC, Hidetoshi Tajima
none Details | Review
6th patch - need null terminate before the 1st mbstowcs() call (1.54 KB, patch)
2003-01-31 21:03 UTC, Hidetoshi Tajima
none Details | Review
7th and final(with hope) patch - don't change sniff_buffer->buffer (1.58 KB, patch)
2003-01-31 22:46 UTC, Hidetoshi Tajima
none Details | Review
Different patch (1.93 KB, patch)
2003-02-18 09:28 UTC, Alexander Larsson
none Details | Review

Description Hidetoshi Tajima 2003-01-16 01:17:01 UTC
Nautilus cannot open any text file with locale's encoded text, contents,
but brings up an error dialog, saying  "No viewers are available for
\"%s\".

For example, in en_US.ISO8859-1 locale, it cannot open a text containing
latin-1 characters.

But, opening text with UTF-8 encoded contents works. 

Is this an expected feature?
Comment 1 Alexander Larsson 2003-01-16 09:23:55 UTC
Unless i misremember completely its supposed to try first utf8, then
the locales default encoding.
Comment 2 Hidetoshi Tajima 2003-01-16 21:09:51 UTC
It does not use text-viewer - since the mime type is
is set to "application/octet-stream" for files of
the current locale's text contents.

But, my version of nautilus is a bit old. Is it supposed to
work on the latest HEAD?
Comment 3 Hidetoshi Tajima 2003-01-16 22:19:02 UTC
looks like this is gnome-vfs. Renaming the same file to
foo.txt, it is treaed with "text/plain" mime type.
Comment 4 Hidetoshi Tajima 2003-01-17 00:24:44 UTC
gnome-vfs allows only UTF-8 encoded text files as text/plain type.
but nautilus's text-viewer can handle UTF-8,  the current locale's
encoding, and "iso-8859-1" contents.

prepared a patch as it allows the same encodings as the text viewer.

Please review and approve commit to the HEAD.

	* libgnomevfs/gnome-vfs-mime-magic.c
(gnome_vfs_sniff_buffer_looks_like_text): 
	allow the locale's default encoding and "ISO-8859-1" as text/plain
	(#103622)
Comment 5 Hidetoshi Tajima 2003-01-17 00:26:14 UTC
Created attachment 13632 [details] [review]
allow the same three encodings as nautilus text viewer does for text/plain
Comment 6 Alexander Larsson 2003-01-21 13:50:24 UTC
This is not really acceptable. All files are valid iso8859-1. This
means all files will be considered text-files...
Comment 7 Hidetoshi Tajima 2003-01-21 21:14:33 UTC
Yeah... then, add a check here if only the locale's
encoding is not 8859-*?

Or, should change nautilus to try a text viewer when a mime
type is application/octet-stream and no associated viewer
is installed?
Comment 8 Hidetoshi Tajima 2003-01-23 22:44:52 UTC
I don't come across how to fix this. Any idea? Is it better to address
this in 
nautilus-side anyhow?
Comment 9 Hidetoshi Tajima 2003-01-24 20:30:59 UTC
this happens on the RH80 with latest updates.
Comment 10 Alexander Larsson 2003-01-29 10:05:43 UTC
Maybe we need to make _gnome_vfs_sniff_buffer_looks_like_text() use
isprint() if the utf8 check fails. That should work pretty well.
Comment 11 Hidetoshi Tajima 2003-01-29 21:36:43 UTC
Thanks, yes it seems to work pretty well. I'll attach
a patch, testing isprint() and \n and \0 disappearance in
addition. \n is to allow and \0 is not to allow.

Please review and commit if it is okay.
Comment 12 Hidetoshi Tajima 2003-01-29 21:38:45 UTC
Created attachment 13931 [details] [review]
do isprint() test if utf8 check fails.
Comment 13 Alexander Larsson 2003-01-30 08:09:15 UTC
I see one issue with the patch, but I'm not sure of the right way to
solve it. The isprint() functions are locale-based, and the locale may
be multi-byte, so the p++ part isn't always right, is it? Won't it
break for e.g. japanese two-byte chars?
Comment 14 Hidetoshi Tajima 2003-01-30 19:09:57 UTC
Yeah...stupid of me... Then, how about using mblen?
The test code will be like:

  #include <stdlib.h>
  ...
  int mb_len = 0;
  char *p;
  for (p = buffer; p < &buffer[buffer_len]; p += mb_len} 
  {
      mb_len = mblen (p, MB_CUR_MAX);
      if (mb_len == 0 || mb_len == -1)
         return FALSE;
  }
  return TRUE;
Comment 15 Hidetoshi Tajima 2003-01-31 00:01:36 UTC
Another way will be to use mbstowcs():

  if (mbstowcs(NULL, buffer, buffer_len) == (size_t)-1)
    return FALSE;

We may also use iswprint() also if the above is not enough.
Comment 16 Hidetoshi Tajima 2003-01-31 01:16:12 UTC
Created attachment 13976 [details] [review]
3rd patch - use mbstowcs () != -1 check locale's text
Comment 17 Hidetoshi Tajima 2003-01-31 03:24:06 UTC
Created attachment 13980 [details] [review]
4th patch - test with iswprint and iswspace
Comment 18 Hidetoshi Tajima 2003-01-31 17:57:56 UTC
Created attachment 13998 [details] [review]
5th patch - add strlen() check to the 4th patch before mbstowcs()
Comment 19 Hidetoshi Tajima 2003-01-31 21:03:54 UTC
Created attachment 14003 [details] [review]
6th patch - need null terminate before the 1st mbstowcs() call
Comment 20 Hidetoshi Tajima 2003-01-31 22:46:52 UTC
Created attachment 14006 [details] [review]
7th and final(with hope) patch - don't change sniff_buffer->buffer
Comment 21 Hidetoshi Tajima 2003-01-31 22:47:50 UTC
I'm sorry for many spams.. wishing this be a final one.
Please review the 7th patch.
Comment 22 Alexander Larsson 2003-02-18 09:28:38 UTC
Created attachment 14405 [details] [review]
Different patch
Comment 23 Alexander Larsson 2003-02-18 09:29:08 UTC
Can you try this patch instead?
Comment 24 Hidetoshi Tajima 2003-02-18 22:16:47 UTC
Yes, it seems to work well.
Comment 25 Alexander Larsson 2003-02-19 09:27:28 UTC
Commiting it then.