After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 420850 - The format of the .tags file should be changed to reduce the size of translated files
The format of the .tags file should be changed to reduce the size of translat...
Status: RESOLVED OBSOLETE
Product: gedit-plugins
Classification: Other
Component: General
unspecified
Other All
: Normal normal
: ---
Assigned To: Gedit maintainers
Gedit maintainers
Depends on:
Blocks:
 
 
Reported: 2007-03-21 03:58 UTC by Matthias Clasen
Modified: 2019-03-23 20:34 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] Drastic diet for the data files of the Tag List plugin (191.50 KB, patch)
2007-03-21 11:34 UTC, Steve Frécinaux
committed Details | Review
[PATCH] Compress the tag files to reduce their size (2.68 KB, patch)
2007-03-21 12:14 UTC, Steve Frécinaux
none Details | Review
[PATCH] Compress the tag files to reduce their size (2.93 KB, patch)
2007-03-21 13:27 UTC, Steve Frécinaux
committed Details | Review

Description Matthias Clasen 2007-03-21 03:58:41 UTC
gedit ships a 6.5MB HTML.tags. Thats just absurd.

1) no need to repeat the gedit: namespace prefix a gazillion times. Just set a
   default namespace.

2) the file does a reverse mapping from unicode characters to iso character
   entities, written down in verbose XML. There _has_ to be a better way to 
   do this,
Comment 1 Steve Frécinaux 2007-03-21 11:03:20 UTC
$ sed -i 's^<gedit:^<^g' HTML.tags.2
$ sed -i 's^</gedit:^</^g' HTML.tags.2
$ ll HTML.tags*

-rw-r--r-- 1 sf 6.2M 2007-03-21 11:56 HTML.tags
-rw-r--r-- 1 sf 4.9M 2007-03-21 11:57 HTML.tags.2

Effectively there is room for improvement in there...
Ah, those old crappy plugins...
Comment 2 Steve Frécinaux 2007-03-21 11:34:50 UTC
Created attachment 85034 [details] [review]
[PATCH] Drastic diet for the data files of the Tag List plugin


Set gedit XML namespace in the XML tag files as the default namespace,
and rework indentation. This doesn't change anything for the
(libxml-based) Tag list plugin, but lowers the total size of the
translated XML files from about 9.2M to about 6.7M (ie ~27%).
---
 ChangeLog                         |   12 +
 plugins/taglist/HTML.tags.xml.in  | 5289 ++++++++++++++++++-------------------
 plugins/taglist/Latex.tags.xml.in |  689 +++---
 plugins/taglist/XSLT.tags.xml.in  |  669 +++---
 plugins/taglist/XUL.tags.xml.in   | 1075 ++++----
 5 files changed, 3865 insertions(+), 3869 deletions(-)
Comment 3 Steve Frécinaux 2007-03-21 11:45:32 UTC
Here is what ls says about the files:

       old   new
HTML   6.2M  4.5M
Latex  844K  617K
XSLT   860K  608K
XUL    1.3M  975K
total  9.2M  6.7M

about 2), I don't think it has much influence since those chars are not translated (so it's negligible compared to the large amount of data the translated entries represent. But for those entries, the whole entry is duplicated for each language. Maybe it's a bit much since only the name attribute changes. Otherwise maybe it should use gettext...

Also, what about storing gzipped tag files ? From my test, it makes the HTML.tags file go down to 466K, due to its high redundancy.
Comment 4 Steve Frécinaux 2007-03-21 12:14:39 UTC
Created attachment 85036 [details] [review]
[PATCH] Compress the tag files to reduce their size


tags files of the Tag List plugins are now gzipped. This allowed to reduce
the total size of tags files from 6.7M to 560K (~92%), due to their high
redundancy.
---
 ChangeLog                                     |    9 +++++++++
 plugins/taglist/Makefile.am                   |   11 ++++++++---
 plugins/taglist/gedit-taglist-plugin-parser.c |    3 ++-
 3 files changed, 19 insertions(+), 4 deletions(-)
Comment 5 Steve Frécinaux 2007-03-21 13:27:40 UTC
Created attachment 85041 [details] [review]
[PATCH] Compress the tag files to reduce their size


Tag files of the Tag List plugins are now gzipped. This allowed to reduce
the total size of tags files from 6.7M to 560K (~92%), due to their high
redundancy.
---
 ChangeLog                                     |   11 +++++++++++
 configure.ac                                  |    1 +
 plugins/taglist/Makefile.am                   |    9 ++++++---
 plugins/taglist/gedit-taglist-plugin-parser.c |    3 ++-
 4 files changed, 20 insertions(+), 4 deletions(-)
Comment 6 Steve Frécinaux 2007-03-21 14:04:33 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
Comment 7 Paolo Maggi 2007-03-22 11:00:37 UTC
I was already aware of this problem.

The applied patch is only a "patch", the real solution consists in changing the format of the .tags file (and than compress it too).

As mclasen suggested we can also remove the gedit namespace.

Note also that the applied patch is broken:


> -		if (strncmp (e->d_name + strlen (e->d_name) - 5, ".tags", 5) == 0)
> +		if (strncmp (e->d_name + strlen (e->d_name) - 5, ".tags", 5) == 0 ||
> +		    strncmp (e->d_name + strlen (e->d_name) - 8, ".tags.gz", 5) == 0)
                                                                             ^

It should be:

strncmp (e->d_name + strlen (e->d_name) - 8, ".tags.gz", 8) == 0
                                                               ^                                                    

Are we sure libxml is always able to read .gz files or we need to check it is compiled with some specific option?

While we are at it may be we can use "-9" to have a better compression.

HTML.tags uncompressed            -> 6478973 bytes
HTML.tags.gz with default options -> 643734 bytes
HTML.tags.gz with -9 option       -> 629480 bytes

I'm wondering if the fact that we are now reading a smaller file give as some performance gain.

Reopening and changing the summary.


Comment 8 Paolo Maggi 2007-03-24 13:21:47 UTC
I have fixed the bug reported in comment #7.

I have also added "--best -f" arguments to gzip.
The first one is needed to obtain a better compression. The second one is needed to overwrite the existing .tags.gz file in the case the corresponding .tags.in file is modified.

Without --best:

paolo@elilix:/gnome/gnome-218/svn/gedit/plugins/taglist$ du -h -c *.gz
468K    HTML.tags.gz
64K     Latex.tags.gz
16K     XSLT.tags.gz
12K     XUL.tags.gz
560K    total

paolo@elilix:/gnome/gnome-218/svn/gedit/plugins/taglist$ gzip -l *.gz
         compressed        uncompressed  ratio uncompressed_name
             472010             4697969  90.0% HTML.tags
              58365              631386  90.8% Latex.tags
              15917              621801  97.4% XSLT.tags
              11835              998019  98.8% XUL.tags
             558127             6949175  92.0% (totals)

With --best:

paolo@elilix:/gnome/gnome-218/svn/gedit/plugins/taglist$ du -h -c *.gz
452K    HTML.tags.gz
60K     Latex.tags.gz
16K     XSLT.tags.gz
12K     XUL.tags.gz
540K    total

paolo@elilix:/gnome/gnome-218/svn/gedit/plugins/taglist$ gzip -l *.gz
         compressed        uncompressed  ratio uncompressed_name
             458544             4697969  90.2% HTML.tags
              55283              631386  91.2% Latex.tags
              15372              621801  97.5% XSLT.tags
               9796              998019  99.0% XUL.tags
             538995             6949175  92.2% (totals)


-----------------

I have a question: how can we manage the case in which gzip is not installed on the machine of the user compiling gedit? 

-----------------

Committed patch:

Index: plugins/taglist/gedit-taglist-plugin-parser.c
===================================================================
--- plugins/taglist/gedit-taglist-plugin-parser.c	(revision 5582)
+++ plugins/taglist/gedit-taglist-plugin-parser.c	(working copy)
@@ -579,7 +579,7 @@ parse_taglist_dir (const gchar *dir)
 	while ((e = readdir (d)) != NULL)
 	{
 		if (strncmp (e->d_name + strlen (e->d_name) - 5, ".tags", 5) == 0 ||
-		    strncmp (e->d_name + strlen (e->d_name) - 8, ".tags.gz", 5) == 0)
+		    strncmp (e->d_name + strlen (e->d_name) - 8, ".tags.gz", 8) == 0)
 		{
 			gchar *tags_file = g_strconcat (dir, e->d_name, NULL);
 			parse_taglist_file (tags_file);
Index: plugins/taglist/Makefile.am
===================================================================
--- plugins/taglist/Makefile.am	(revision 5582)
+++ plugins/taglist/Makefile.am	(working copy)
@@ -41,7 +41,7 @@ plugin_in_files = taglist.gedit-plugin.d
 
 %.tags.gz: %.tags.xml.in $(INTLTOOL_MERGE) $(wildcard $(top_srcdir)/po/*.po)
 	LC_ALL=C $(INTLTOOL_MERGE) $(top_srcdir)/po $< $(@:.gz=) -x -u -c $(top_builddir)/po/.intltool-merge-cache
-	$(GZIP) $(@:.gz=)
+	$(GZIP) --best -f $(@:.gz=)
 
 plugin_DATA = $(plugin_in_files:.gedit-plugin.desktop.in=.gedit-plugin)
Comment 9 Ignacio Casal Quinteiro (nacho) 2009-08-27 20:17:11 UTC
Seems that here we miss the question about not having gzip installed? If so, I think we didn't have problems about that, so what about closing it?
Comment 10 André Klapper 2012-07-30 17:20:14 UTC
$:andre\> pwd
/opt/git-gnome/gedit-plugins/plugins/taglist
$:andre\> ls -l
total 152
-rw-rw-r--. 1 andre andre 51539 Jul 30 19:17 HTML.tags.xml.in
-rw-rw-r--. 1 andre andre  6576 Jul 30 19:17 Latex.tags.xml.in
-rw-rw-r--. 1 andre andre   302 Jul 30 19:17 taglist.plugin.desktop.in.in
-rw-rw-r--. 1 andre andre  7603 Jul 30 19:17 XSLT.tags.xml.in
-rw-rw-r--. 1 andre andre 12417 Jul 30 19:17 XUL.tags.xml.in

Looks acceptable to me, can we close this as FIXED / OBSOLETE?
Comment 11 Tobias Mueller 2012-11-19 21:07:21 UTC
Yeah, let's close it. Please reopen if there is an issue left regarding the tags.