After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 611257 - When decompression the non-english zip file, the file name in zip file will become messy code.
When decompression the non-english zip file, the file name in zip file will b...
Status: RESOLVED NOTGNOME
Product: file-roller
Classification: Applications
Component: general
unspecified
Other Linux
: Normal critical
: ---
Assigned To: Paolo Bacchilega
file-roller-maint
Depends on:
Blocks:
 
 
Reported: 2010-02-27 03:47 UTC by broken.zhou
Modified: 2011-01-08 20:41 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Adds gconf option to use zip/unzip instead of p7zip if available (2.77 KB, patch)
2010-04-05 21:39 UTC, Alexander Saprykin
none Details | Review

Description broken.zhou 2010-02-27 03:47:22 UTC
Many zip file are made by WinRAR/ZIP. They don't use UTF-8 but other encoding, such as gb18030(Chinese environment). When I use "unzip", I can specify a character encoding:
unzip -O gb18030 XXXX(filename). So I can add UNZIP = "-O GB18030" to /etc/environment to avoid file-roller making messy code. 

But if I install p7zip, The file-roller will decompression the zip file by p7zip instead of unzip. And I didn't find a option in p7zip to specify the character encoding. This time, the file name in zip file will become messy code. I really need p7zip to decompression 7z.

May be file-roller can give a option whether use p7zip or unzip to uncompression the file?
Comment 1 broken.zhou 2010-03-11 09:28:12 UTC
Almost two week. Why no reply? nobody see the problem?
Comment 2 Paolo Bacchilega 2010-03-11 12:18:32 UTC
this could be fixed giving priority to unzip over p7zip.
Comment 3 broken.zhou 2010-03-11 12:24:45 UTC
then...HOW?
Comment 4 broken.zhou 2010-03-11 12:26:29 UTC
And what if i want to use p7zip to compression the zip file by p7zip?
Comment 5 broken.zhou 2010-03-16 05:28:20 UTC
......I don't understand why it has taken so long to respond.
Please give a exact answer: it isn't a bug OR how to fix it?
Comment 6 Paolo Bacchilega 2010-03-16 10:33:11 UTC
there is no solution at the moment, as a workaround just uninstall p7zip in order to use unzip.
Comment 7 Alexander Saprykin 2010-04-05 21:39:55 UTC
Created attachment 158003 [details] [review]
Adds gconf option to use zip/unzip instead of p7zip if available

Hi!
I (not only) have a lot of problems with .zip archives created in Windows programs, because files have custom encoding - CP1251 for example. After extracting such archives there are messy filenames. This patch adds gconf option to use zip/unzip instead of p7zip. It disabled by default. If you have any thoughts to improve this patch - just let me know. I hope this helps a lot of non-english people.
Comment 8 Paolo Bacchilega 2010-04-05 23:22:15 UTC
I don't see the reasons for a gconf option, what's the problem with always giving the priority to zip/unzip over p7zip ?
Comment 9 Alexander Saprykin 2010-04-06 05:49:03 UTC
I think it's not a good idea to give zip/unzip higher prioroty, because these programs have limitations on resulting file size, files per directory, size of archiving files. Also it doesn't support multivolume archives. p7zip is free of all these limits and supports multivolume archives. Also it seems zip/unzip don't have Zip64 extension support, and p7zip does. So I think using p7zip is better choice, until you have to face with Windows-produced archives in custom encoding. Here is about zip/unzip limits: http://www.info-zip.org/FAQ.html#limits.
Comment 10 broken.zhou 2010-04-06 06:06:20 UTC
Someone says that p7zip can specify the encoding by option "-scc".Is it true?
I didn't find it in the man page.

The p7zip's algorithm is better than info zip. Can we use p7zip to create the archieves and unzip to decompress?
Comment 11 Alexander Saprykin 2010-04-06 06:44:39 UTC
(In reply to comment #10)
> Someone says that p7zip can specify the encoding by option "-scc".Is it true?
> I didn't find it in the man page.
> 
> The p7zip's algorithm is better than info zip. Can we use p7zip to create the
> archieves and unzip to decompress?

-scc option used for input/output encoding and has only 3 possible values: WIN, DOS, UTF-8. Not that we need.
Comment 12 Paolo Bacchilega 2010-04-06 09:30:12 UTC
(In reply to comment #10)
> The p7zip's algorithm is better than info zip. Can we use p7zip to create the
> archieves and unzip to decompress?

sure, this looks like the best solution to me.
Comment 13 Alexander Saprykin 2010-04-07 18:32:37 UTC
Well, 7zip's zip compression is about 2-5% better, I think. But what about compatibility? Imagine situation, when you have compressed big (over 4Gb) file with p7zip and couldn't decompress it with unuzip on the same computer. Not good..
Anyway, I was thinking about implementing solution with mixed p7zip/unzip actions. Do you have any ideas how to do it in the best way? I think that modifying fr-command-zip isn't a good way. Mixed interface implementation such as fr-command-7z-unzip? I have some time and can implement a solution, but we need to find the best way to do it.
Comment 14 Paolo Bacchilega 2010-04-07 20:18:05 UTC
another solution is to p7zip authors to add a way to specify the character encoding :)
Comment 15 Paolo Bacchilega 2010-04-07 20:20:01 UTC
I meant:

another solution is to ask the p7zip authors to add a way to specify the character
encoding :)
Comment 16 Alexander Saprykin 2010-04-08 04:56:19 UTC
Yeah, it's the best solution here :) Maybe only we can do is to assign to zip higher priority (if zip/unzip availabe), I think. It's well to use zip/unzip if user has them.
Comment 17 broken.zhou 2010-05-13 13:28:47 UTC
the newest info zip seems to have abandoned the -O option. So use info zip isn't good way to solve this problem. I find that we can simply use "env LANG=ENCODING fill-roller XXX.zip" to open the zip file. ENCODING is the proper encoding to the XXX.zip file. For me, it is zh_CN. And ensure that you have installed the p7zip.
if you used nautilus, you can make it opened by "env LANG=ENCODING fill-roller".
Of course this isn't the best solution to this problem. Maybe file-roller can auto-detect the file encoding just like gedit?
Comment 18 Mahendra Tallur 2010-09-22 14:07:22 UTC
Hi ! 

Well, those zip files that contain files with "invalid characters" cannot be extracted / renamed with File-roller BUT they can be extracted using the terminal and unzip, even with no particular option / environment variable.

Sure, the resulting files still have invalid characters but at least they are extracted.

Would it be possible to make file-roller capable of extracting file even with invalid characters ? They can be renamed afterwards anyway.

Indeed : I don't mind to use the terminal to work around this, but as many Windows users still send some zip files with invalid encodings, new Linux users will definitely be disconcerted.

Cheers, thanks for reading this & thanks again to Gnome devs & al.
Comment 19 broken.zhou 2010-09-22 14:16:09 UTC
I don't understand what you mean. File-roller can extract the file with invalid characters. You just need to rename it manually.
Comment 20 Mahendra Tallur 2010-09-22 14:20:41 UTC
Ah then, I stumbled upon a new issue. I definitely have several archives with some filenames containing invalid characters (shown as "?") that file-roller fails to rename.

May I share a sample archive with you ?
Comment 21 broken.zhou 2010-09-22 14:24:48 UTC
Do you mean file-roller crashes when trying to extract these file?
Comment 22 Mahendra Tallur 2010-09-22 14:27:18 UTC
Nope : 

- when trying to extract it : "caution : filename not matched"
- when trying to rename it : "an error occured while adding files to the archive" (well, that's a translation from French so the exact message may differ)
Comment 23 Mahendra Tallur 2010-09-22 14:29:22 UTC
(but, as I said, I can just "unzip" the archive from the terminal and rename the filenames from Nautilus afterwards)
Comment 24 Ilya Chernykh 2010-09-23 14:48:00 UTC
The same issue here on OpenSUSE 11.3. While Ark from KDE3 opens a .zip file well:

https://bugzillafiles.novell.org/attachment.cgi?id=391043

File Roller only displays garbage:

https://bugzillafiles.novell.org/attachment.cgi?id=391042
Comment 25 Ilya Chernykh 2010-09-23 14:50:33 UTC
Please tell me when all these problems with encoding in Linux are over. Such issues sharply limit Linux usage in enterprise/government/education environments in non-English speaking countries.
Comment 26 Paolo Bacchilega 2010-09-23 17:20:25 UTC
(In reply to comment #20)
> May I share a sample archive with you ?

yes please
Comment 27 Ilya Chernykh 2010-09-23 17:32:15 UTC
Example of problematic archive:

https://bugzilla.novell.com/attachment.cgi?id=391041
Comment 28 Paolo Bacchilega 2010-09-24 07:19:46 UTC
I can see the bug with winzip (on windows) and with ark 2.14 as well.
Comment 29 Ilya Chernykh 2010-09-24 13:18:54 UTC
This means your distribution did not patch unzip (upstream unzip refused to patch for Russian language). Here on OpenSUSE unzip is patched so all OK in Ark (2.6.4). In File Roller also all OK if to uninstall p7zip package.
Comment 30 Mahendra Tallur 2010-09-24 13:59:00 UTC
I'm not sure to get it.
If you consider the sample archive above ; on my setup (French Ubuntu 10.04/10.10):

With File-roller :
- extracting fails "filename not matched"
- renaming fails "an error occured"

However, I can extract those files by using "unzip" on the command line (yes, it produces an invalid filename, but at least it's extracted and can be renamed via Nautilus)

My question is : even though there is an encoding issue, is it possible to enable file-roller to extract anyway, as unzip is capable of it ?

Note : I have no "p7zip" package installed.

Cheers, thanks for reading :-)
Comment 31 Paolo Bacchilega 2010-09-24 15:04:04 UTC
(In reply to comment #30)

you can't extract the single file but you can extract all the files with the "extract here" command in the Nautilus menu.
Comment 32 Alexander Saprykin 2010-09-24 20:08:59 UTC
(In reply to comment #25)
> Please tell me when all these problems with encoding in Linux are over. Such
> issues sharply limit Linux usage in enterprise/government/education
> environments in non-English speaking countries.

I see no problems with that example you've pointed out - all files were successfully extracted using file-roller. Yeah, the filenames are messy, but you can rename them manualy. It's a problem of Windows(?) buggy software used for packing files. It's not a file-roller problem.
Comment 33 Mahendra Tallur 2010-09-24 20:21:53 UTC
Alexander : did you actually manage to extract single files by drag&drop (or rename them via eog) ? "Extract all files" does work, but not the former, on my setup.
Comment 34 Ilya Chernykh 2010-09-24 21:25:53 UTC
> It's a problem of Windows(?) buggy software used for packing files. It's not a file-roller problem.

It is not buggy software of Windows. The archive is packaged in conventional Windows encoding, like any other archive under Windows. There is no problem to extract files from this archive using unzip or its front-ends such as Ark or File Roller when p7zip is not installed. But once you install p7zip you get the garbage in file names in File Roller.
Comment 35 Alexander Saprykin 2010-09-24 21:44:12 UTC
(In reply to comment #33)
> Alexander : did you actually manage to extract single files by drag&drop (or
> rename them via eog) ? "Extract all files" does work, but not the former, on my
> setup.

I extracted them using "Extract here" menu in Nautilus, and also using d'n'd - all is ok.
Comment 36 Alexander Saprykin 2010-09-24 21:51:07 UTC
(In reply to comment #34)
> > It's a problem of Windows(?) buggy software used for packing files. It's not a file-roller problem.
> 
> It is not buggy software of Windows. The archive is packaged in conventional
> Windows encoding, like any other archive under Windows. There is no problem to
> extract files from this archive using unzip or its front-ends such as Ark or
> File Roller when p7zip is not installed. But once you install p7zip you get the
> garbage in file names in File Roller.

Well, utf8 is de-facto standard today. So, it's more a Windows problem that it can't deal properly with Unicode. It's hard to find all-in-one solution. It is possible to set higher priority to unzip for .zip files, but there are other problems with unzip (see my comments above). So, I think 7zip is better choice. Anyway, you can file a bug in 7zip bugzilla, too.
Comment 37 Ilya Chernykh 2010-09-24 23:01:07 UTC
> Well, utf8 is de-facto standard today. 

No, the facto standard in Russia is Win-1251 encoding.

> So, it's more a Windows problem that it can't deal properly with Unicode.

Windows users have no problems with .zip archives, only Linux users have.

> It is possible to set higher priority to unzip for .zip files, but there are other problems with unzip (see my comments above).

In this case having some feature limitations (associated with using zip instead of 7z) is more acceptable than an outright bug which annoys virtually any non-English speaking Linux user. Anyway, it may be possible to make possibility to choose the default archiver in File Roller.
Comment 38 broken.zhou 2010-09-25 02:16:33 UTC
I think the encoing auto detection isn't very hard to achieve. Just like many text editors. It's a perfect solution.

You can't make all the non-English speaking user turn to use utf-8. We need to communciate with other people who still use windows and WinRAR and extract the achieve from them.

Alexander Saprykin provide a patch, maybe you can try?
Comment 39 Ilya Chernykh 2010-09-25 02:47:39 UTC
> I think the encoing auto detection isn't very hard to achieve. Just like many
text editors. It's a perfect solution.

There is already patch in OpenSUSE's unzip that automatically detects the encoding. The problem is File Roller prefers using p7zip to unzip, so the auto-detection only works when p7zip package is uninstalled.
Comment 40 broken.zhou 2010-09-25 03:12:13 UTC
Really????
It can auto detect the encoing you used? Unzip can extract almost every achieve normally(utf-8 and other)?
Please attach that patch.

Alexander Saprykin provide a patch that may solve your problem. You can try it.
Comment 41 Ilya Chernykh 2010-09-25 04:46:37 UTC
It uses a special library that auto-detects encoding. This has been tested with Russian and Chech languages. You can find the relevant links here: https://bugzilla.novell.com/show_bug.cgi?id=540598
Comment 42 Alexander Saprykin 2010-09-25 07:08:05 UTC
(In reply to comment #37)
> > Well, utf8 is de-facto standard today. 
> 
> No, the facto standard in Russia is Win-1251 encoding.

I wish it wasn't not :)

> > So, it's more a Windows problem that it can't deal properly with Unicode.
> 
> Windows users have no problems with .zip archives, only Linux users have.

Well, I think there is no ideal solution for the subject case. And not all distributions include that patch for unzip, isn't it?

> > It is possible to set higher priority to unzip for .zip files, but there are other problems with unzip (see my comments above).
> 
> In this case having some feature limitations (associated with using zip instead
> of 7z) is more acceptable than an outright bug which annoys virtually any
> non-English speaking Linux user. Anyway, it may be possible to make possibility
> to choose the default archiver in File Roller.

It's not so straight forward. Maybe some users need >4Gb archives support? Maybe they don't have large files support in kernel? I made patch for that reason, so the advanced user can choose zip arciver in GConf. Anyway, the final word stands for Paolo.
Comment 43 Paolo Bacchilega 2010-09-25 07:22:09 UTC
This is not a file-roller bug, the bug is in unzip and p7zip.  

Adding encoding detection to file-roller would fix the wrong filenames, but you still couldn't extract a single file.
Comment 44 broken.zhou 2010-10-14 05:39:41 UTC
No. p7zip just use current environment encoding as the file encoding. If you specify the environment encoding (LANG) in file-roller , you will be able to extract a single file.
Comment 45 Ilya Chernykh 2010-10-14 06:45:39 UTC
> It's not so straight forward. Maybe some users need >4Gb archives support?
Maybe they don't have large files support in kernel? I made patch for that
reason, so the advanced user can choose zip arciver in GConf.

I think most users would prefere normal encoding rather than 4G support. I think unzip should be default and 7z optional.
Comment 46 Ilya Chernykh 2010-10-14 06:49:20 UTC
> I wish it wasn't not :)

If you think that by causing troubles to document exchange with Windows you will force people to Linux, you are in error. Just the opposite, people will think Linux is buggy.