After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 314453 - Nautilus crashes in Solaris when browsing the attached file.
Nautilus crashes in Solaris when browsing the attached file.
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Solaris
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2005-08-25 07:43 UTC by dinoop.thomas
Modified: 2008-06-11 19:39 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Attaching the tar file (2.50 KB, application/x-compressed-tar)
2005-08-25 07:48 UTC, dinoop.thomas
  Details
Attached patch fixes the issue (1.23 KB, patch)
2005-08-29 05:21 UTC, dinoop.thomas
rejected Details | Review
Patch to fix the issue (1.42 KB, patch)
2005-09-06 13:11 UTC, dinoop.thomas
rejected Details | Review

Description dinoop.thomas 2005-08-25 07:43:30 UTC
Locale: All UTF-8 locales
OS: Solaris

1. Unpack the attached tar file (crashName.tar)
2. Launch Nautilus and browse this directory 

Result: Nautilus crashes.
Comment 1 dinoop.thomas 2005-08-25 07:48:28 UTC
Created attachment 51320 [details]
Attaching the tar file
Comment 2 dinoop.thomas 2005-08-29 05:19:23 UTC
Attached filenames in the directory are gb18030 encoded, created on
zh_CN.GB18030 locale. 
When nautilus loads a directory, the files in the directory need to be sorted.
This is done by creating a key from the display name
nautilus_file_get_display_name_collation_key() which uses g_utf8_collate_key()
to generate the key. Here the file name contains 'invalid unicode' in UTF-8
locales. The g_utf8_collate_key() function returns 'NULL' in this case which is
then passed to strcmp(). This results in the crash in Solaris. We include an
additional check to ensure that the keys are not 'NULL' before passing them to
strcmp().
 
Comment 3 dinoop.thomas 2005-08-29 05:21:21 UTC
Created attachment 51493 [details] [review]
Attached patch fixes the issue
Comment 4 Alexander Larsson 2005-08-29 10:04:49 UTC
This seems wrong. nautilus_file_get_display_name_nocopy() should *always* return
a utf8 string even if the on-disk filename is not utf8, otherwise all sorts of
things can break. We need to figure out why its not returning utf8 instead.
Comment 5 Alexander Larsson 2005-08-29 12:00:46 UTC
I can't reproduce this on linux, so i can't debug further. You need to figure
out why nautilus_file_get_display_name_nocopy() doesn't return utf8.
Comment 6 dinoop.thomas 2005-09-06 13:08:52 UTC
In nautilus_file_get_display_name_nocopy() we finally call eel_make_valid_utf8()
to get a valid utf-8 name. This function converts the invalid bytes in the file
name to '?' and appends 'Invalid Unicode' to the file name.

The unichar U+708B7 falls in the range of valid unichar and hence
eel_make_valid_utf8() includes this byte in the file name. 

Changing the range of the valid unichar so that U+708B7 doesn't fall into the
valid category, will solve the issue.
Comment 7 dinoop.thomas 2005-09-06 13:11:37 UTC
Created attachment 51866 [details] [review]
Patch to fix the issue
Comment 8 Owen Taylor 2005-09-08 11:21:24 UTC
Comment on attachment 51866 [details] [review]
Patch to fix the issue

This patch is wrong. The current macro defines valid 
Unicode code points according to the Unicode standard.
Not all points in this range have assigned values but that isn't a
factor in UTF-8 manipulation.
You need to figure out where you are
getting the NULL from - as far as I can tell
g_utf8_collate_key() will never return NULL for any input other than NULL.
Comment 9 dinoop.thomas 2005-09-08 14:41:25 UTC
As i understand, the the root cause is below code in function g_utf8_collate_key()
{
 ....
 xfrm_len = strxfrm (NULL, str_norm, 0);
 result = g_malloc (xfrm_len + 1);
 strxfrm (result, str_norm, xfrm_len + 1);
 ...
}
strxfrm(NULL, str_norm, 0) will return -1 for some special chinese characters if
it is gb18030 encoded on utf-8 locale, and thus result will be NULL. strxfrm
seems to depend on OS too, hence the crash doesn't occur on suse Linux for the
same file.  
Comment 10 Takao Fujiwara 2005-09-08 15:36:26 UTC
> This patch is wrong. The current macro defines valid 
> Unicode code points according to the Unicode standard.
> Not all points in this range have assigned values but that isn't a
> factor in UTF-8 manipulation.

Unicode 4.0 is defined below.
http://www.unicode.org/Public/UNIDATA/NamesList.txt

U+30000 - U+E0000 is out of the range, then strxfrm returns -1.
Comment 11 Owen Taylor 2005-09-08 20:34:05 UTC
This range doesn't have assigned characters but neither do quite
a few code points in the BMP. Does strxfrm return -1 for those as well?

I don't have the POSIX spec at hand here so I don't know what it says
for strxfrm returning -1, but it doesn't seem to me that this is something
that GLib can predict - what if GLib is built with the Unicode-4.1 character
database, but the underlying OS only has 4.0. We already have handling
of the case where the string isn't representable in the current locale.
The same handling could be used for strxform returning -1.
Comment 12 Takao Fujiwara 2005-09-09 11:15:05 UTC
I understand your mention. It is difficult to conver all assignments exactly.
I'm not sure which version of Unicode are supported in glib however the above
range has not assigned chars since Unicode 4.0:
http://www.unicode.org/Public/4.0-Update/NamesList-4.0.0.txt

We would like to avoid crashes at least.
Comment 13 Matthias Clasen 2005-09-26 15:07:25 UTC
May be related to bug 314255, which is also about strxfrm returning unexpected
values.
Comment 14 Takao Fujiwara 2005-09-28 11:50:24 UTC
Can we integrate the patch?
Yes, strxfrm() depends on LC_COLLATE.
Solaris supports Unicode 4.0.

# 7/18/2003:    Added additional LC_MONETARY keywords and definitions defined
#               in the SUSv3/AGR/Unix200x spec.
#
# 2/19/2004:    Added Unicode 4.0 characters (total 1,226 new characters).
#               Details can be found from the following:
#
#                       http://www.unicode.org/versions/Unicode4.0.0/
#
Comment 15 Owen Taylor 2005-09-30 03:00:33 UTC
To reiterate, the patch is wrong. I can't see any justification for
adding some particular ranges of unassigned code points to UNICODE_VALID
because the Solaris strxfrm() happens to choke on them.

(Does it work on all other unassigned codepoinst?)
Comment 16 Takao Fujiwara 2005-10-06 11:25:52 UTC
Ah, I didn't understand your question.

I eliminated the critical codepoints but yes, it still keeps unassgined
codepoints in the macro.

OK, I would like to compile this bug.
I think there are three problems:
#1. nautilus crashes with strcmp (0, 0).
#2. UNICODE_VALID() is implemented in glib but strxfrm() is in libc.
#3. UNICODE_VALID() does not reflect unicode map perfectly.


#1. nautilus crashes with strcmp (0, 0).
To begin with, I expected the attachement 51493 is a good way. Now we need to
evaluate if it is expected that g_utf8_collate_key() returns NULL.

  if (g_get_charset (&charset))
    {
      xfrm_len = strxfrm (NULL, str_norm, 0);
      result = g_malloc (xfrm_len + 1);
      strxfrm (result, str_norm, xfrm_len + 1);
    }

This calculates the string length, i.e. xfrm_len, as the current locale but not
UTF-8. I think it's an expected way. Could you confirm it?


That means strxfrm() is possible to return -1 with UTF-8 string on the current
locale.
Then g_malloc(0) returns NULL and g_utf8_collate_key() also returns NULL.
What do you think?

#2. UNICODE_VALID() is implemented in glib but strxfrm() is in libc.
    I think the unicode implementation is in one of them. What concepts do you
have in glib?

#3. UNICODE_VALID() does not reflect unicode map perfectly.
    Personally I would like to fix this method, too. I checked Solaris source
codes. Solaris have the map of the two dimension array of unicode group and plan
for each code points. strxfrm uses the array internally. I cannot transfer the
code but it would be a hit.

If it is expected that g_utf8_collate_key() returns NULL, I think we can fix the
root cause.
Comment 17 Matthias Clasen 2008-06-11 19:39:24 UTC
        * glib/gunicollate.c (g_utf8_collate_key): Handle strfxrm returning
        -1 a little better. Problem pointed out by Takao Fujiwara

Please reopen if that doesn't fix your problem.