After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 96531 - Function to guess display form of a filename
Function to guess display form of a filename
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.0.x
Other opensolaris
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2002-10-22 18:54 UTC by Hidetoshi Tajima
Modified: 2011-02-18 16:13 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement



Description Hidetoshi Tajima 2002-10-22 18:54:20 UTC
I cannot stop thiking it is broken with several 
reasons below.

- By starting a gedit once with G_BROKEN_FILENAMES=1 
  and restarting a new gedit next without it, in
  non-UTF-8 locales such as ja_JP.eucJP, you will get
  gtk_wanrings, such as:

    Gtk-Message: The filename "No_NAME 1" couldn't be    converted to UTF-8
(try setting the environment variable G_BROKEN_FILENAMES): (following some
Japanese
message)

- Not only in gedit, Nautilus has a similar problem.

- glib, gedit, eel and nautilus has own copies
  of has_broken_filenames().

- I'm afraid GNOME project's calling a component of
  other project "broken".

I'd like to propose some alternative solution to handle filename's encoding
variants in glib's API
so that other components can import it.

Also, for locale varients of file contents, why 
not gedit provide some APIs? Then, gedit and gnome-terminal will be able to
have Load feature
of encodings without code duplication in-between.
Comment 1 Owen Taylor 2002-11-19 22:21:18 UTC
Not really sure what this bug is asking for:

 - G_BROKEN_FILENAMES is a runtime configuration option,
   not an API.
   
   The API in Glib is:

    - There is a on disk encoding of filenames
    - There are functions to convert between this 
      encoding and UTF-8.

 - If you _don't_ have a consistent encoding for the
   filenames on disk, you are in a broken situation - 
   even more broken then the situation that
   G_BROKEN_FILENAMES refers to. :-)

   Making guesses as to what encoding a filename is
   in that fails g_filename_to_utf8() is in for 
   display purposes is not unreasonable. Nautilus
   does this.

   But it is mostly a question of structuring your
   app correct to keep the filename in filename
   form as long as possible then converting to
   UTF-8 only for display. I suppose we could
   add a 

    g_filename_to_display_utf8() 

   call that first tried g_filename_to_utf8()
   and then if that failed, g_locale_to_utf8(), but I'm
   not sure that that is that useful.

 - File content encoding handling is basically a totally
   separate issue. It certainly would be nice to have 
   encoding guessing functionality for files and maybe
   IO channels. But that is a reasonably hard problem.

   (I thought there was an open bug on this, I don't  
   see it, however)
Comment 2 Hidetoshi Tajima 2002-11-21 04:38:41 UTC
Calling G_BROKEN_FILENAMES an API may be wrong.
Sure. I'm okay to call it a run-time configuration
option. But this is not really a point I'd want
to argue :-)

Basically, what I'm asking for are - two things.

1) Let's stop relying on G_BROKEN_FILENAMES.
2) And start thinking of a better way to handle filename's encodings
for gedit and nautilus and
other gnome apps. It may be not glib's issue, 
but some other modules. gnome-vfs could have 
encoding information of a file, for example.

I understood a purpose of G_BROKEN_FILENAMES but anyway it is
insufficient to save really broken
situation where more than two different encoded
filenames are found in the same diretory folder.

[Agreed on file contents is an separate issue.
I did not know if it's really glib's so did not
log a separate bugzilla. Do you have any idea
which module should be right? What I'm asking for
it is to propose to have a common code in one 
module so that gedit and gnome-terminal should 
avoid duplicated codes as well as duplicated 
bugs.]




Comment 3 Hidetoshi Tajima 2003-06-09 20:40:54 UTC
Reopening...

My claim was basically from the fact that I've had to find
and fixed many similar bugs. They are due to lack of call to
g_filename_from/to_utf8() for the code conversion between
gtk_file_selection widget using string(UTF-8) and the strings 
of the filenames. There were indeed lots and still a few
left, like bug 114296 in libgnomeui.

To some extents, this has something to do with dependency on
G_BROKEN_FILENAMES=TRUE, so I'm wondering if there can be any better
way to handle filenames in GTK+ and GNOME land than G_BROKEN_FILENAME
and g_filename_to/from_utf8().

So, I'd like to argue here what can be a resolution if any.
Comment 4 Owen Taylor 2003-06-10 02:33:21 UTC
The plan is that the GTK+-2.4 file selector will separate
out "display name" from "file encoding", so you'll be
able to select files in the GUI with broken encodings.

I don't think there are any other places in GTK+ that
make interpretation of filenames. Certainly none in
GLib/ATK/Pango.

I suppose a "make a best effort to create a display form
of this filename" function could be useful, though
guessing the encoding of a string as short as a filename
is almost impossible.
Comment 5 Owen Taylor 2003-11-05 17:43:10 UTC
Some related comments in bug 114068
Comment 6 Hidetoshi Tajima 2003-11-05 19:00:53 UTC
Thanks for the head up. I read the discussion 
in the bug 114068, but I do prefer to go with the above mentioned plan
that one is able to select "display name" in fileselection. 

Has this been implemented already in late GTK+? Then, I'm fine to
close this bug fixed,(and  I'd 
second not to have 114068 change, either).
Comment 7 Matthias Clasen 2003-11-05 23:12:37 UTC
Glib interprets the variable G_FILENAME_ENCODING now, which can be set
to a list of encodings (with @locale being recognized as the charset
of the current locale). This list of encodings could be used for
iterative guessing in a g_filename_get_display_name() function.
Comment 8 Federico Mena Quintero 2004-07-08 23:49:50 UTC
About comment #6, the new GtkFileChooser is in place and it does the right thing
with respect to filenames / display names:  it may not be able to display a
filename, but it does let you select it and otherwise work with it.

Indeed, as Matthias says, we still don't handle a list of encodings in
G_FILENAME_ENCODING; only the first element in that list is used.

I don't know if we still need a function that guesses display versions of
filenames based on such a list.  See the recommendations here:
http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings
and here:
http://primates.ximian.com/~federico/news-2004-06.html#15
Comment 9 Matthias Clasen 2004-11-03 07:26:21 UTC
we have a function now to guess display versions of filenames based on a list of
encodings.