After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 306403 - filenames in non-UTF-8 encodings are not handled correctly
filenames in non-UTF-8 encodings are not handled correctly
Status: RESOLVED OBSOLETE
Product: file-roller
Classification: Applications
Component: general
2.14.x
Other All
: Normal normal
: ---
Assigned To: file-roller-maint
file-roller-maint
: 152236 320467 333225 346018 547312 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2005-06-03 18:56 UTC by Simos Xenitellis
Modified: 2020-11-11 19:15 UTC
See Also:
GNOME target: ---
GNOME version: 2.13/2.14


Attachments
Patch for src/fr-command.c (597 bytes, patch)
2008-08-26 12:15 UTC, Takao Fujiwara
none Details | Review

Description Simos Xenitellis 2005-06-03 18:56:33 UTC
Please describe the problem:
The ZIP format (http://www.info-zip.org/pub/infozip/doc/) does not specify the
encoding of the filenames of the compressed files. 

Therefore, ZIP files created on old systems may contain filenames in an
non-latin 8-bit encoding (for example, Cyrillic, Greek, etc).

Fileroller has trouble dealing with these files as it cannot "autoconvert" the
filename from the source encoding to UTF-8.

An informal survey has been carried out on this:
a. WinZIP (Windows) manages to autodetect/convert

Steps to reproduce:
1. Download 
http://www.thranio.gr/sxolikes-giortes/telikes/omilies/apoxairetisthrio-logos-mathith.zip
(it contains a single .doc file; the name is in Greek, in CP737 encoding (iconv
--list)).
2. Open with file-roller.
3. Observe the filename - try to extract file.


Actual results:
An incorrect filename appears. The file cannot be extracted, neither renamed
from within the ZIP archive.

Expected results:
File-roller should attempt to detect the encoding of the filename and do the
appropriate conversion (iconv-style) to UTF-8. If it cannot do a conversion, it
should nevertheless make the file accessible. For example, unconverted
characters could be changed to 0xFFFD (Unicode Replacement Character,
http://www.fileformat.info/info/unicode/char/fffd/index.htm).

Does this happen every time?
Yes

Other information:
Have a look at the thread at http://mail.nl.linux.org/linux-utf8/2005-06/#00000
and specifically walk through the SUMMARY mail.
Comment 1 Simos Xenitellis 2005-06-03 18:57:04 UTC
*** Bug 152236 has been marked as a duplicate of this bug. ***
Comment 2 Simos Xenitellis 2005-06-03 19:35:22 UTC
Summary URL: http://mail.nl.linux.org/linux-utf8/2005-06/#00010
Comment 3 Helge Hielscher 2005-06-03 20:16:10 UTC
The encoding detection of modern browsers works quite good for me, e.g.
http://www.mozilla.org/projects/intl/chardet.html

Filename encodings can be changed with convmv:
http://j3e.de/linux/convmv/
Comment 4 Teppo Turtiainen 2006-03-18 11:56:16 UTC
*** Bug 320467 has been marked as a duplicate of this bug. ***
Comment 5 Teppo Turtiainen 2006-03-18 11:56:23 UTC
*** Bug 333225 has been marked as a duplicate of this bug. ***
Comment 6 Teppo Turtiainen 2006-03-18 12:04:47 UTC
Confirmed with File Roller 2.14.0 on Ubuntu Dapper.
Comment 7 Mait 2006-04-12 11:56:32 UTC
I have same problem too, with cp949 encoding files. Although annoying way, zip command help me correct this.

$ zip -FF cp949.zip (-F or -FF)
......

$ unzip cp949.zip
...... <- filename is broken.

$ convmv -f cp949 -t utf8 cp949.zip
..... <- If convering name show correctly, trying real change name.

$ convmv -f cp949 -t utf8 cp949.zip --notest
done

Without 'zip -FF' process, convmv complain bad encoding.

Regards,
Mait

My system profile,

ubuntu@ubuntu:~$ locale
LANG=ko_KR.UTF-8
LC_CTYPE="ko_KR.UTF-8"
LC_NUMERIC="ko_KR.UTF-8"
LC_TIME="ko_KR.UTF-8"
LC_COLLATE="ko_KR.UTF-8"
LC_MONETARY="ko_KR.UTF-8"
LC_MESSAGES="ko_KR.UTF-8"
LC_PAPER="ko_KR.UTF-8"
LC_NAME="ko_KR.UTF-8"
LC_ADDRESS="ko_KR.UTF-8"
LC_TELEPHONE="ko_KR.UTF-8"
LC_MEASUREMENT="ko_KR.UTF-8"
LC_IDENTIFICATION="ko_KR.UTF-8"
LC_ALL=ko_KR.UTF-8

ubuntu@ubuntu:~$ zip -help | grep '\-F'
  -F   fix zipfile (-FF try harder) -D   do not add directory entries

ubuntu@ubuntu:~$ unzip -v
UnZip 5.52 of 28 February 2005, by Ubuntu. Original by Info-ZIP.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 4.0.3 (Ubuntu 4.0.3-1ubuntu3) for Unix (Linux ELF) on Mar 23 2006.

UnZip special compilation options:
        ACORN_FTYPE_NFS
        COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method not supported)
        SET_DIR_ATTRIB
        TIMESTAMP
        USE_EF_UT_TIME
        USE_UNSHRINK (PKZIP/Zip 1.x unshrinking method supported)
        USE_DEFLATE64 (PKZIP 4.x Deflate64(tm) supported)
        VMS_TEXT_CONV
        WILD_STOP_AT_DIR
        [decryption, version 2.9 of 05 May 2000]

UnZip and ZipInfo environment options:
           UNZIP:  [none]
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]
ubuntu@ubuntu:~$ zip -v
Copyright (C) 1990-2005 Info-ZIP
Type 'zip "-L"' for software license.
This is Zip 2.31 (March 8th 2005), by Info-ZIP.
Currently maintained by Onno van der Linden. Please send bug reports to
the authors using http://www.info-zip.org/zip-bug.html; see README for details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip, as of
above date; see http://www.info-zip.org for other sites.

Compiled with gcc 4.0.1 20050522 (prerelease) (Debian 4.0.0-7ubuntu7) for Unix (Linux ELF) on May 26 2005.

Zip special compilation options:
        ASM_CRC
        ASMV
        USE_EF_UT_TIME
        [encryption, version 2.9 of 05 May 2000]
Encryption notice:
        The encryption code of this program is not copyrighted and is
        put in the public domain.  It was originally written in Europe
        and, to the best of our knowledge, can be freely distributed
        in both source and object forms from any country, including
        the USA under License Exception TSU of the U.S. Export
        Administration Regulations (section 740.13(e)) of 6 June 2002.

Zip environment options:
             ZIP:  [none]
          ZIPOPT:  [none]

Comment 8 Simos Xenitellis 2006-07-03 11:46:23 UTC
Now, this appears to be a generic problem that manifests itself in several places.
I feel it is desirable to have some sort of generic library that takes into account the system locale settings and automagically determines the most appropriate source encoding before it converts to UTF-8.

Apart from the ZIP file format that does not specify the encoding of the filenames, IDv1/IDv2 tags in MP3 files do not specify the encoding either. Therefore, even services such as www.mugshot.org (shares the current playing song details) stumble on the issue.

The "algorithm" would be something like:

1. Try to check if the string is UTF-8 encoded. If yes, use as is, else Step 2.
2. Case of system locale,
   el_GR: try  iconv -f iso-8859-7 -t utf-8. If succeeds, accept.
          try  iconv -f cp737 -t utf-8. If success, accept
   fr_FR: ...
   de_DE: ...

I do not know if the association between locale value and legacy encodings is available in a comprehensive list. If so, it would be trivial to complete without individual input from each locale users.
Comment 9 Simos Xenitellis 2006-07-03 11:55:26 UTC
Another bug report that deals with the same issue,
http://bugzilla.mugshot.org/show_bug.cgi?id=724
Comment 10 André Klapper 2007-03-13 21:23:32 UTC
*** Bug 346018 has been marked as a duplicate of this bug. ***
Comment 11 Simos Xenitellis 2007-06-26 15:17:37 UTC
Adding links on this
a. Dmitry Butskoy volunteered to write a patch for "unzip.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=225576
b. Ubuntu blueprint that captures the current info of this issue,
https://blueprints.launchpad.net/unzip/+spec/unzip-detect-filename-encoding
Comment 12 masoris 2007-12-30 16:17:31 UTC
There are another report from Ubuntu 7.10
https://bugs.launchpad.net/fileroller/+bug/177929
Comment 13 Alkis Georgopoulos 2008-07-30 15:55:46 UTC
Encoding autodetection, as proposed by Simos, should be implemented and it'll cover many cases. But it's not enough, there'll always be cases where it will fail, because there is much overlapping in the #128 - #255 area in many code pages.

So, a "manually select filename encoding" option is necessary, and as such, maybe it should the first one to be implemented, even as just a command line option with no GUI equivalent.
E.g. from unzip --help:
  -O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives
  -I CHARSET  specify a character encoding for UNIX and other archives

As a sidenote,
* As of September 2007, the zip file format supports utf-8 encoding:
  http://www.pkware.com/documents/casestudies/APPNOTE.TXT
* After that, many zip products support utf-8, like winzip, 7zip etc.
Please support that too, it would solve much of our encoding problems, at least for newly created .zips.
Comment 14 Alkis Georgopoulos 2008-07-30 16:33:07 UTC
Simos, thanks for the blueprint, at least we now have a workaround.

https://blueprints.launchpad.net/unzip/+spec/unzip-detect-filename-encoding
export UNZIP="-O cp737"
export ZIPINFO="-O cp737"
(to be used in system configuration files).
Comment 15 Paolo Bacchilega 2008-08-17 13:27:52 UTC
*** Bug 547312 has been marked as a duplicate of this bug. ***
Comment 16 Takao Fujiwara 2008-08-26 12:15:06 UTC
Created attachment 117397 [details] [review]
Patch for src/fr-command.c

Regarding to bug 547312, I think previous file-roller doesn't have this bug.
Previous file-roller can show the different filenames with the garbaged chars.

The attached patch can fix bug 547312 at the moment.
Comment 17 Paolo Bacchilega 2008-08-26 19:03:52 UTC
(In reply to comment #16)
> Created an attachment (id=117397) [edit]
> Patch for src/fr-command.c
> 
> Regarding to bug 547312, I think previous file-roller doesn't have this bug.
> Previous file-roller can show the different filenames with the garbaged chars.
> 
> The attached patch can fix bug 547312 at the moment.
> 

actually, bug 547312 is supposed to be already fixed in version 2.23.6 
Comment 18 Takao Fujiwara 2008-08-28 07:29:51 UTC
> actually, bug 547312 is supposed to be already fixed in version 2.23.6 

Thanks for your reply.
I confirmed "LC_ALL=C" is changed to "LC_MESSAGES=C" in 2.23.6.
Comment 19 Luke Hutchison 2008-09-04 17:21:28 UTC
There may be another complicating issue -- the convention is to assume the filenames are in CP850 format and convert them automatically to ISO-8859-1 format on unzip, i.e. it is not just that there is a lack of conversion:

http://www.linuxfromscratch.org/blfs/view/stable/introduction/locale-issues.html#locale-wrong-filename-encoding
Comment 20 Artemy Tregubenko 2010-12-23 09:12:15 UTC
AFAIK in ubuntu file-roller handles .rar files too. These files have the same issue with encoding. If I right-click the file in nautilus and choose "Extract here" from contect menu, files are extracted with wrong names. However if I invoke 'unrar x file.rar' from command line, files are extracted with correct names.

I understand that description of that bug in tracker applies to .zip files only. However the title is generic and doesn't mention zip. If my rar problem should be reported as a separate bug, please tell me, I'll file it as separate.
Comment 21 Στέργιος Προσινικλής 2011-01-09 21:31:01 UTC
Problem with .rar files, does not appear when only unrar is installed to your system. See comment #58 in https://bugs.launchpad.net/ubuntu/+source/file-roller/+bug/177929

Furthermore, Ark is not influenced by the presence or not of rar package.
Looking at the file-roller and ark source,
the problem with unrar command implementation,
lies in /file-roller-2.30.1.1/src/fr-command-rar.cin file

Not a developer, but in the aforementioned file,
changing the order of these two lines as follows,
and compiling solves the problem in file-roller...

if (have_rar ())
                fr_process_begin_command (comm->process, "unrar");
 else
  fr_process_begin_command (comm->process, "rar");

HopeThatHelps
Comment 22 Josselin Mouette 2011-03-15 10:13:21 UTC
This is one of the major problems when manipulating ZIP files from the Windows world. By default, winzip and many other software still encode filenames in legacy encodings, leading to incorrect filenames when they are extracted on Linux environments.

We definitely need to address this one way or another. And fixing winzip is unfortunately not an option.
Comment 23 Shimi Chen 2011-04-01 10:55:31 UTC
7-Zip for windows has a way of detecting the encoding correctly. It's open-source so could there be a way to see how they do it and implement it?
Comment 24 Duncan Lithgow 2011-06-02 18:37:29 UTC
Could someone please provide a link to a new ZIP file to test this. Thanks.
Comment 26 Dario 2012-05-31 16:41:49 UTC
A zip archive that file roller fails to unzip is freely downloadable here [0] (sorry, it's quite big) and the file with accent is:

"/Gibilterra Land/04 Vincenzo Costantino Cinaski - Niente è grande come le piccole cose.mp3"

The name is read by file roller as "04 Vincenzo Costantino Cinaski - Niente e?? grande come le piccole cose.mp3" thus both renaming and extracting fail with the following msg:

"caution: filename not matched:  Gibilterra Land/04 Vincenzo Costantino Cinaski \- Niente e\?\? grande come le piccole cose.mp3"

If needed I can try to create a smaller test-case.

Further info:
* File Roller v. 3.4.1 on Ubuntu 12.04

[0] http://multimedia.kataweb.it/xl/XL-VIDEODROME/mp3/GibilterraLand.zip
Comment 27 Paolo Bacchilega 2012-05-31 17:20:21 UTC
(In reply to comment #26)
> A zip archive that file roller fails to unzip is freely downloadable here [0]
> (sorry, it's quite big) and the file with accent is:
> 
> "/Gibilterra Land/04 Vincenzo Costantino Cinaski - Niente è grande come le
> piccole cose.mp3"
> 
> The name is read by file roller as "04 Vincenzo Costantino Cinaski - Niente e??
> grande come le piccole cose.mp3" thus both renaming and extracting fail with
> the following msg:
> 
> "caution: filename not matched:  Gibilterra Land/04 Vincenzo Costantino Cinaski
> \- Niente e\?\? grande come le piccole cose.mp3"
> 

I cannot reproduce the problem, try to execute the following command to see if the output is correct:

7z l -slt -bd -y -- /home/paolo/Scrivania/GibilterraLand.zip 

if 7z is not installed on your system, try this one instead:

unzip -ZTs -- GibilterraLand.zip 

these are the two commands used by file-roller to list the content of a zip archive, the priority is given to 7z, if it is not available unzip is used.

It can be useful to know the command versions as well, for me:

"7z --help"  prints  "7-Zip [64] 9.20"
"unzip"      prints  "UnZip 6.00 of 20 April 2009"
Comment 28 Dario 2012-05-31 18:42:34 UTC
I haven't 7z, so I use unzip.

The relevant part is:
-rw-r--r--  2.1 unx  2786539 bX defN 20120522.144145 Gibilterra Land/04 Vincenzo Costantino Cinaski - Niente e?? grande come le piccole cose.mp3
-

..so wrong.

That's strange since my unzip version matches yours:

UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.


After installing 7z the problem is different: I can rename and extract the file, but it appears to ben named "04 Vincenzo Costantino Cinaski - Niente eÌ grande come le piccole cose.mp3"

Don't know if matters but my locale is IT_it
Comment 29 Paolo Bacchilega 2012-05-31 18:52:09 UTC
(In reply to comment #28)
> I haven't 7z, so I use unzip.
> 
> The relevant part is:
> -rw-r--r--  2.1 unx  2786539 bX defN 20120522.144145 Gibilterra Land/04
> Vincenzo Costantino Cinaski - Niente e?? grande come le piccole cose.mp3
> -
> 
> ..so wrong.
> 
> That's strange since my unzip version matches yours:
> 
> UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.
> 
> 
> After installing 7z the problem is different: I can rename and extract the
> file, but it appears to ben named "04 Vincenzo Costantino Cinaski - Niente eÌ
> grande come le piccole cose.mp3"
> 
> Don't know if matters but my locale is IT_it

maybe this is the problem, mine is it_IT.utf8
Comment 30 Dario 2012-05-31 18:56:51 UTC
Sorry: I wrote without checking (I thought it could be a problem with Italian's locale, I didn't notice you were Italian too!), mine it's
"LANG=it_IT.UTF-8
LC_CTYPE=it_IT.UTF-8" too.

I don't know what else I could check for...
Comment 31 Dario 2012-05-31 19:43:15 UTC
I forgot to include 7z's output and version.

Output for the file is:
Path = Gibilterra Land/04 Vincenzo Costantino Cinaski - Niente eÌ grande come
le piccole cose.mp3
Folder = -
Size = 2786539
Packed Size = 2738046
Modified = 2012-05-22 14:41:46
Created = 
Accessed = 
Attributes = .....
Encrypted = -
Comment = 
CRC = 3B41EBEE
Method = Deflate
Host OS = Unix
Version = 20


While version is:
7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=it_IT.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)

I attach also the "zipinfo -v" output as reference:
Central directory entry #11:
---------------------------

  There are an extra 16 bytes preceding this file.

  Gibilterra Land/04 Vincenzo Costantino Cinaski - Niente e?? grande come le
piccole cose.mp3

  offset of local header from start of archive:   25514261
                                                  (0000000001855115h) bytes
  file system or operating system of origin:      Unix
  version of encoding software:                   2.1
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   2.0
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          yes
  file last modified on (DOS date/time):          2012 May 22 14:41:46
  file last modified on (UT extra field modtime): 2012 May 22 14:41:45 local
  file last modified on (UT extra field modtime): 2012 May 22 12:41:45 UTC
  32-bit CRC value (hex):                         3b41ebee
  compressed size:                                2738046 bytes
  uncompressed size:                              2786539 bytes
  length of filename:                             91 characters
  length of extra field:                          12 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  Unix file attributes (100644 octal):            -rw-r--r--
  MS-DOS file attributes (00 hex):                none

  The central-directory extra field contains:
  - A subfield with ID 0x5855 (old Info-ZIP Unix/OS2/NT) and 8 data bytes:
    11 f1 c5 4f 89 89 bb 4f.

  There is no file comment.
Comment 32 André Klapper 2012-06-10 10:00:14 UTC
Duplicate of bug 581496?
Comment 33 Ma Hsiao-chun 2012-11-12 18:37:20 UTC
We have Unicode enabled and not Unicode enabled ZIP archives.
We also three kind of tools Info-Zip, p7zip, The Unarchiver.

UnZip in Info-Zip always list non-ASCII character in file names as '?'. It can correctly extract Unicode enabled archives, though.
https://mail.gnome.org/archives/desktop-devel-list/2012-November/msg00047.html
UnZip is generally included by default installation and File Roller supports it.

p7zip can correctly list non-ASCII character in file names for Unicode enabled archives. It has no luck on not Unicode enabled ones.
File Roller supports 7z and prefer 7z to unzip when 7z is installed. That's why p7zip can be a workaround for some people.

The Unarchiver [1] or lsar/unar supports auto encoding detect and manual encoding selection natively. Please check its man page for inspiration.
http://manpages.ubuntu.com/manpages/precise/en/man1/lsar.1.html
http://manpages.ubuntu.com/manpages/precise/en/man1/unar.1.html
File Roller has limited support for it currently. File Roller don't use unar for ZIP archives currently.

1. http://code.google.com/p/theunarchiver/
Comment 34 gcox_for_bugzilla 2013-02-17 16:38:51 UTC
(In reply to comment #33)

> The Unarchiver [1] or lsar/unar supports auto encoding detect and manual
> encoding selection natively. Please check its man page for inspiration.
> http://manpages.ubuntu.com/manpages/precise/en/man1/lsar.1.html
> http://manpages.ubuntu.com/manpages/precise/en/man1/unar.1.html
> File Roller has limited support for it currently. File Roller don't use unar
> for ZIP archives currently.
> 
> 1. http://code.google.com/p/theunarchiver/

This comment doesn't seem to have received much attention, but it was the first time I'd heard about The Unarchiver, so I tried using it. I found that "unar" on the command line correctly handled a zip file compressed on Windows with Shift-JIS filenames, which File Roller (using unzip) had problems with. In other words, at least in some cases, using unar instead of unzip fixes this bug!

To give some more details about the support for unar in File Roller: The source for File Roller contains the add-in file /src/fr-command-unarchiver.c (and the .h header file). However, unar is given a low priority, because it is at the bottom of all of the register_archive calls in /src/fr-init.c (on line 371), and also because it cannot write zip files, only read them, as it says in the nearby comment (line 342):

	/* The order here is important. Commands registered earlier have higher
	 * priority.  However commands that can read and write a file format
	 * have higher priority over commands that can only read the same
	 * format, regardless of the registration order. */

This suggests the following possible workarounds and solutions for this bug:
(Please note: these are only suggestions, and are not necessarily all desirable or feasible solutions.)

1) Use unar on the command line instead of File Roller. (This is just a workaround.)

2) Comment out the following line (line 367) in /src/fr-init.c and recompile. This disables the use of zip/unzip, so that unar is used instead (I think. I haven't actually checked.) The disadvantage is that zip cannot be used to create archives, so this is also not a very good solution.

	register_archive (FR_TYPE_COMMAND_ZIP);

3) Alter /src/fr-init.c to change the priorities of archive commands so that unar is used instead of unzip. This is a more permanent solution but requires a substantial change to the program.

4) Alter File Roller so that the priority of archive commands is set using the GUI or a configuration file, instead of in the source code. Again, this is a substantial change.

5) Alter the source of unzip so that it uses the same method as unar for detecting encodings. This is a substantial change to zip/unzip, which is upstream of File Roller. Also, unar is written in Objective C and unzip is in C.

It also raises the following questions:

Q1) Does unar solve this problem for all non-UTF-8 filenames? To put it another way, are there any cases where unar fails to handle filenames?

Q2) Are there any reasons to prefer unzip to unar (for example, does unar have bugs or additional dependencies, or does unzip have extra capabilities)?

Q3) Are there any other problems with any of the above solutions?

I'm new to commenting on bugs so please tell me if I made any breaches of etiquette or if anything else is wrong with my comment.
Comment 35 Eternal Sorrow 2014-02-10 12:57:50 UTC
So, this bug is almost 9 years old and still no changes? Unzip with patches using libnatspec detects filename encoding correctly, but file-roller doesn't. Just tried with file-roller-3.10. Shall anyone fix this?
Comment 36 André Klapper 2014-02-10 13:19:52 UTC
Bugs get fixed quicker if somebody provides a patch. Age is not a criterion.
Comment 37 Ma Hsiao-chun 2014-02-10 13:24:46 UTC
Sounds like someone missed some orientation before using GNOME:
http://www.jwz.org/doc/cadt.html
Comment 38 Yan Pashkovsky 2015-05-01 21:58:47 UTC
I know that Xarchiever deals correctly with encoding, maybe port code from this project? https://sourceforge.net/projects/xarchiver/
Comment 39 Yan Pashkovsky 2015-05-01 22:11:12 UTC
AAA, got it! Please, provide option to use unzip instead of 7zip. Found problem root on mate github https://github.com/mate-desktop/engrampa/issues/5
Comment 40 Pilot6 2015-05-06 15:06:48 UTC
I made a patch to ger File-roller always use unzip instead of p7zip for zip files.
Here is the bug link.

https://bugs.launchpad.net/bugs/1382106

This can be done as an option in a config file. But this way it is good for me too.

Cheers,

Dmitry
Comment 41 Tommy He 2016-03-04 07:18:50 UTC
(In reply to Pilot6 from comment #40)
> I made a patch to ger File-roller always use unzip instead of p7zip for zip
> files.
> Here is the bug link.
> 
> https://bugs.launchpad.net/bugs/1382106
> 
> This can be done as an option in a config file. But this way it is good for
> me too.
> 
> Cheers,
> 
> Dmitry

Hmm, preferable unzip doesn't seem to work for me on Fedora 23 with unzip 6.0. Read some comments elsewhere the proper handling of non-ASCII coded filename is only added after 6.0 release, possibly in current Beta form of unzip 6.10.

Giving there's no tangible release date of unzip 6.10, would it be possible to let file-roller using unar first if presented on system while dealing with zip file?
Comment 42 Stefan 2016-04-16 18:16:06 UTC
Hi,

I was downloading some music files I had bought, including files with French and German characters, which were all added into one zip file before it was sent down the line. file-roller did not display the French and German characters correctly.

file-roller is only a graphical front-end for a command line application, in my case unzip.
Using the unzip -l command on the zip file revealed exactly the same output as in file-roller (funny symbols where French or German characters should be).
The comment made here (https://bugs.launchpad.net/debian/+source/unzip/+bug/10979/comments/25) explains what needs to be done for unzip to display characters correctly.  In my case using the option "-O UTF-8" would bring up the correct characters:

unzip -l -O UTF-8 somezipfilewithUTF-8characters.zip

According to: http://manpages.ubuntu.com/manpages/xenial/en/man1/unzip.1.html#contenttoc6 environment variables can be set in order to have unzip always use a certain character set.

I added the following to /etc/environment
UNZIP="-O UTF-8"

then I needed to log out and log in again (no reboot) for the setting to take effect.
Using the command unzip -v revealed that my settings were in effect:

UnZip and ZipInfo environment options:
           UNZIP:  -O UTF-8
        UNZIPOPT:  [none]

Now executing unzip -l somezipfilewithUTF-8characters.zip was enough to display the file names correctly in the terminal.

But file-roller wouldn't do that.

The comment by the developer (https://bugzilla.gnome.org/show_bug.cgi?id=306403#c27) helps to understand what file-roller is doing to list the content of a zip file: unzip -ZTs

The "Z" stands for ZipInfo mode (source: unzip --help), which I assumed calls the command zipinfo.  This command offers again the same -O option to define a particular character set (source: zipinfo --help).

The unzip -v command indicated that the following environment options are active:

UnZip and ZipInfo environment options:
           UNZIP:  -O UTF-8
        UNZIPOPT:  [none]
         ZIPINFO:  [none]
      ZIPINFOOPT:  [none]

The ZIPINFO variable is empty and I concluded that I would need to add the same line as before in the /etc/environment file for UNZIP now again for ZIPINFO.

I added the following line to /etc/environment 
ZIPINFO="-O UTF-8"

logged out and in again and the ran the command unzip -v to show:

UnZip and ZipInfo environment options:
           UNZIP:  -O UTF-8
        UNZIPOPT:  [none]
         ZIPINFO:  -O UTF-8
      ZIPINFOOPT:  [none]

Now running file-roller on the zip file showed all characters correctly.  :D

It works also for the file mentioned in https://bugzilla.gnome.org/show_bug.cgi?id=306403#c26  (http://multimedia.kataweb.it/xl/XL-VIDEODROME/mp3/GibilterraLand.zip)

For the Hebrew file (https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/580961/+attachment/1803463/+files/%D7%90%D7%A7%D7%95%D7%9C%D7%95%D7%92%D7%99%D7%94%20%D7%9C%D7%9E%D7%94%D7%A0%D7%93%D7%A1%D7%99%D7%9D.zip) in comment 25 (https://bugzilla.gnome.org/show_bug.cgi?id=306403#c25) I needed to exchange UTF-8 with 862 in both lines in /etc/environment. Afterwards (logging out and in again) it showed Hebrew characters in file-roller.  I got the character set number from: https://bugs.launchpad.net/debian/+source/unzip/+bug/10979/comments/17 

My system details:
unzip -v
UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.

file-roller version: 3.6.3

Kernel: 3.19.0-32-generic x86_64 (64 bit) Desktop: Cinnamon 2.8.8  Distro: Linux Mint 17.3 Rosa

I hope this long write-up will be helpful.

It would be nice if file-roller could, if not auto-detect, but make the character-set selectable.  From this point-of-view its not a bug but maybe a badly documented feature.
Comment 43 unxed 2020-06-24 11:22:16 UTC
I recently wrote patches to p7zip and unzip for OEM charset detection based on system locale. It's exactly that windows internal zip encoder does.

https://sourceforge.net/p/infozip/patches/29/

https://sourceforge.net/p/p7zip/bugs/187/

To get correct file names in file-roller you just need to install patched p7zip and set your system locale correctly. Or do something like

alias 7z='LC_ALL=el_GR.UTF-8 7z'

if you prefer opening archives using the locale different from system one.
Comment 44 unxed 2020-06-24 11:53:02 UTC
Alkis Georgopoulos is planning to package patched p7zip to .deb's and upload to  ppa: https://github.com/mate-desktop/engrampa/issues/5#issuecomment-648410042
Comment 45 André Klapper 2020-11-11 19:15:09 UTC
bugzilla.gnome.org is being replaced by gitlab.gnome.org. We are closing all old bug reports and feature requests in GNOME Bugzilla which have not seen updates for a long time.

If you still use file-roller and if you still see this bug / want this feature in a currently supported version of GNOME (currently that would be 3.38), then please feel free to report it at https://gitlab.gnome.org/GNOME/file-roller/-/issues/

Thank you for creating this report and we are sorry it could not be implemented (volunteer workforce and time is limited).