After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 476761 - Profile zipfiles have duplicated files
Profile zipfiles have duplicated files
Status: RESOLVED FIXED
Product: sabayon
Classification: Deprecated
Component: general
2.19.x
Other All
: Normal major
: ---
Assigned To: Federico Mena Quintero
Maintainers of sabayon
Depends on:
Blocks:
 
 
Reported: 2007-09-14 01:19 UTC by Federico Mena Quintero
Modified: 2008-02-21 15:45 UTC
See Also:
GNOME target: ---
GNOME version: 2.19/2.20


Attachments
sabayon-duplicated-files.txt (2.45 KB, text/plain)
2007-09-14 01:19 UTC, Federico Mena Quintero
  Details
Patch to make zip_directory() behave (1.68 KB, patch)
2008-01-22 18:25 UTC, Sayamindu Dasgupta
needs-work Details | Review
A less resource intensive version of the previous patch. (1.44 KB, patch)
2008-02-18 16:19 UTC, Sayamindu Dasgupta
none Details | Review
sabayon-bgo476761-make-should-ignore-generic.diff (5.78 KB, patch)
2008-02-19 01:47 UTC, Federico Mena Quintero
none Details | Review
Latest version of the patch, making use of util.should_ignore_* (2.71 KB, patch)
2008-02-21 15:00 UTC, Sayamindu Dasgupta
none Details | Review

Description Federico Mena Quintero 2007-09-14 01:19:04 UTC
Igor Morgado reported this.  The profile zipfiles have duplicated files in them; this makes them bigger than needed and incorrect.

See the attached part of the debug log for dir-monitor;files-source;storage.

When the storage module is asked to add a directory, it recurses within the directory, adding everything it finds to the zipfile.  However, it is the *file* module that knows what should be added (since dir-monitor already told it the exact files to use).

The storage module should not recurse when adding directories.
Comment 1 Federico Mena Quintero 2007-09-14 01:19:35 UTC
Created attachment 95559 [details]
sabayon-duplicated-files.txt
Comment 2 Sayamindu Dasgupta 2008-01-22 18:24:57 UTC
I looked into this. It looks like it is the zip_directory() method in storage.py which seems to be recursing in a blind way.

When a non empty directory is added to a profile, a entry for that goes in the directory section of the metadata, and yet another entry goes into the file section of the metadata, the second entry being for the file inside the directory.

When zip_foreach() is called, the file inside the directory is added to the zip multiple times - once while zip_directory (called by zip_foreach) recurses inside the parent directory, and another time when zip_foreach() directly adds the file (due to the entry for the file being in the file section of the metadata).

I'm attaching a patch which should make zip_directory() behave.
Comment 3 Sayamindu Dasgupta 2008-01-22 18:25:55 UTC
Created attachment 103469 [details] [review]
Patch to make zip_directory() behave
Comment 4 Federico Mena Quintero 2008-01-31 18:11:42 UTC
Thanks for tracking this down, Sayamindu :)

This looks good; it needs a bit of work:

- Doing "filename in DIRECTORIES_TO_IGNORE" and the same for directories is not quite enough.  I think it's time to pull out should_ignore_*() from dirmonitor.py and put it somewhere else, maybe in util.py.  Then, the rest of the code should use those functions.

- The patch calls save_zip.namelist() on every file.  This is a bit of overkill, and will give us O(n^2) performance in the end... can we call namelist() only once at the beginning of the save() process, and then update our "own" list with the newly-added files?

I'm wondering how to have an automated test for the storage module that lets us catch all of these errors in a single spot...  that's a good weekend project, hint hint :)
Comment 5 Sayamindu Dasgupta 2008-02-18 16:17:50 UTC
(In reply to comment #4)
> Thanks for tracking this down, Sayamindu :)
> 
> This looks good; it needs a bit of work:
> 
> - Doing "filename in DIRECTORIES_TO_IGNORE" and the same for directories is not
> quite enough.  I think it's time to pull out should_ignore_*() from
> dirmonitor.py and put it somewhere else, maybe in util.py.  Then, the rest of
> the code should use those functions.
> 
> - The patch calls save_zip.namelist() on every file.  This is a bit of
> overkill, and will give us O(n^2) performance in the end... can we call
> namelist() only once at the beginning of the save() process, and then update
> our "own" list with the newly-added files?
> 
> I'm wondering how to have an automated test for the storage module that lets us
> catch all of these errors in a single spot...  that's a good weekend project,
> hint hint :)
> 

I'm not sure whether it would be sane to move around should_ignore_* at this stage of the release cycle. However, I think it will be a good idea to fix at least the duplicate file issue - am attaching a patch for that (I have modified it as per your comment, and we have our own list to keep track of files now).

Comment 6 Sayamindu Dasgupta 2008-02-18 16:19:55 UTC
Created attachment 105513 [details] [review]
A less resource intensive version of the previous patch.
Comment 7 Federico Mena Quintero 2008-02-19 01:47:18 UTC
Created attachment 105555 [details] [review]
sabayon-bgo476761-make-should-ignore-generic.diff

Sayamindu, does this help implement what you want?  The patch pulls out should_ignore_* from dirmonitor.py, and puts them in util.py for anything to use.
Comment 8 Sayamindu Dasgupta 2008-02-19 02:51:08 UTC
(In reply to comment #7)
> Created an attachment (id=105555) [edit]
> sabayon-bgo476761-make-should-ignore-generic.diff
> 
> Sayamindu, does this help implement what you want?  The patch pulls out
> should_ignore_* from dirmonitor.py, and puts them in util.py for anything to
> use.
> 

Thanks Federico for this :-). Only that util.py should import fnmatch and the should_ignore_dir() method should call "return should_ignore_dir (base_dir, ignore_dir_list, parent)" instead of "return should_ignore_dir (parent)" during recursion.
Comment 9 Federico Mena Quintero 2008-02-19 03:39:45 UTC
Testing before committing is for sissies :)

Thanks for catching that; it's in svn now.
Comment 10 Sayamindu Dasgupta 2008-02-21 15:00:35 UTC
Created attachment 105702 [details] [review]
Latest version of the patch, making use of util.should_ignore_*

This patch makes use of util.should_ignore_*. It also has it's own list of files in the zipfile and updates the list as and when files are added.
Ok to commit ?
Comment 11 Sayamindu Dasgupta 2008-02-21 15:45:32 UTC
Based on the discussion with Johnny on IRC, I updated the patch to use util.get_home_dir() instead of os.environ["HOME"], and committed the patch.
Closing the bug.