After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 359653 - GStreamer Binary Registry patch
GStreamer Binary Registry patch
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gstreamer (core)
git master
Other Linux
: Normal enhancement
: 0.10.20
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2006-10-04 17:59 UTC by Mathieu Garcia
Modified: 2008-06-14 11:13 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
GStreamer Binary Registry patch (3.36 KB, patch)
2006-10-04 18:01 UTC, Mathieu Garcia
none Details | Review
GStreamer Binary Registry C file (14.20 KB, text/x-csrc)
2006-10-04 18:02 UTC, Mathieu Garcia
  Details
GStreamer Binary Registry header file (6.40 KB, text/x-chdr)
2006-10-04 18:02 UTC, Mathieu Garcia
  Details
GStreamer Binary Registry C file (14.51 KB, text/x-csrc)
2006-10-08 12:31 UTC, Mathieu Garcia
  Details
GStreamer Binary Registry header file (7.14 KB, text/x-chdr)
2006-10-08 12:32 UTC, Mathieu Garcia
  Details
GStreamer Binary Registry C file (ensonic) (16.03 KB, patch)
2006-10-09 03:23 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
GStreamer Binary Registry header file (ensonic) (5.71 KB, patch)
2006-10-09 03:24 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
GStreamer Binary Registry C file (ensonic) (19.53 KB, patch)
2006-10-20 10:45 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
GStreamer Binary Registry H file (ensonic) (4.51 KB, patch)
2006-10-20 10:45 UTC, Stefan Sauer (gstreamer, gtkdoc dev)
none Details | Review
GStreamer Binary Registry C file (with align feature) (20.38 KB, text/x-csrc)
2006-12-13 10:44 UTC, Mathieu Garcia
  Details
GStreamer Binary Registry header file (with align feature) (4.58 KB, text/x-chdr)
2006-12-13 10:45 UTC, Mathieu Garcia
  Details
Add --enable-binary-registry feature and mix the source code as necessary (5.50 KB, patch)
2006-12-28 17:29 UTC, Josep Torra Valles
committed Details | Review
GStreamer Binary Registry C file (20.61 KB, text/x-csrc)
2006-12-28 17:41 UTC, Josep Torra Valles
  Details
GStreamer Binary Registry H file (4.61 KB, text/x-chdr)
2006-12-28 17:43 UTC, Josep Torra Valles
  Details

Description Mathieu Garcia 2006-10-04 17:59:56 UTC
Hello,

Here's a basic implementation of the binary registry. There might be some bugs.
First, copy gstregistrybinary.c and gstregistrybinary.h into the gst/ directory, then apply the patch. Recompile and install.
The libxml2 dependancy was not removed. By default w/ this patch, gstreamer will use the binary registry.
Comment 1 Mathieu Garcia 2006-10-04 18:01:08 UTC
Created attachment 74015 [details] [review]
GStreamer Binary Registry patch
Comment 2 Mathieu Garcia 2006-10-04 18:02:08 UTC
Created attachment 74016 [details]
GStreamer Binary Registry C file
Comment 3 Mathieu Garcia 2006-10-04 18:02:40 UTC
Created attachment 74017 [details]
GStreamer Binary Registry header file
Comment 4 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-06 15:54:36 UTC
seems to work great. sysprof gives good speedup figures
xml-registry: 26.70
bin-registry: 3.36

I wonder why the binary registry is around 5 times bigger.
2567204 2006-10-06 18:42 registry.i686.bin
 529225 2006-10-06 17:47 registry.i686.xml

Mathieu, could you perhaps documents all of the struct members like (GstBinaryPluginElement.m32p). I'll now add some tracing to tailor the GST_BINARY_REGISTRY_XXX_LEN defines.
Comment 5 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-06 20:21:07 UTC
some more comments:

#1: please add:
#define GST_CAT_DEFAULT GST_CAT_REGISTRY
to gstregistrybinary.c

#2: wouldn't it make more sense to use pascal style strings (16bit length, followed by data). Thats the current stringspace usage.
name         17 /  256
description 289 / 1024
filename     61 /  256
version       9 /   64
license       7 /  256
source       21 /  256
package      38 / 1024
origin       50 / 1024

#3: use gtk-doc style comments (but start local docs with /* instead of /**

#4: use malloc not calloc if memory gets initialized anyway

#5: in gst_registry_binary_write_cache() you walk registry->plugins backwards and use g_list_append in gst_registry_binary_save_plugin(). Can't we reverse that?

#6: gst_registry_binary_get_binary_plugin() looks leaky / broken. At first you once do g_free(plugin); and once gst_object_unref(plugin);. Next you leak all strings on all those return -1
Comment 6 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-06 20:26:17 UTC
erm another one. Could you use a #define to conditionally use one or the other registry implementation?. Alternatively, what about making GstRegistryXML and GstrRegistryBinary subclasses of GstRegistry and then turn 

gst_registry_read_cache
gst_registry_write_cache
gst_registry_file_extension

virtual methods.
Comment 7 David Schleef 2006-10-06 22:28:33 UTC
I purposely got rid of the registry subclasses, because it's a *cache*, not a registry.  Thus, if the cache is not compatible with the current library (say the new library uses a binary registry), the new library will simply regenerate it.  There's no need to have 2 registry implementations at runtime.

Having two registry implementations in the source is no problem, but I expect that we'll switch over to binary registries and never look back once the kinks are worked out.
Comment 8 Mathieu Garcia 2006-10-08 12:31:40 UTC
Created attachment 74278 [details]
GStreamer Binary Registry C file
Comment 9 Mathieu Garcia 2006-10-08 12:32:28 UTC
Created attachment 74279 [details]
GStreamer Binary Registry header file
Comment 10 Mathieu Garcia 2006-10-08 12:41:55 UTC
Thank you for your comments. I've made some modifications according to your suggestions.

#1 : "#define GST_CAT_DEFAULT GST_CAT_REGISTRY" was added in the C file
#2 : Using pascal strings is a good idea but gcc doesn't recommend it, please see http://developer.apple.com/documentation/DeveloperTools/gcc-4.0.1/gcc/Pascal-Strings.html for further comments. So, as a workaround, I added a define "GST_BINARY_REGISTRY_SIZE_FACTOR" (set to 4 by default), to divide the lengths (GST_BINARY_REGISTRY_NAME_LEN, etc.). Some sanity check should be added in case we reach 0 byte length. So you can change this define to fit your needs.
#3 : I'll check about gtk-doc
#4 : Changed all calloc's to malloc's, as memory is always initialized. Should speed-up the process a little bit.
#5 : Actually if we use a prepend operator, we'll corrupt the header, etc. I didn't check much how to solve it but I'll do it soon.
#6 : Fixed leaks + changed g_free's to gst_object_unref's

Regarding the implementation of subclasses, I prefer to wait until we find a solution that would fit everyone's needs. Any more opens ?
Comment 11 René Stadler 2006-10-08 14:55:15 UTC
Note that Stefan meant pascal *style* strings.  Real pascal strings that are backed by the compiler don't work anyways since they have a maximum length of 255.

You should most probably change GstBinaryPluginFeature to remove the string fields and add an array for the string lengths instead.  Directly after writing out the storage representation of a GstBinaryPluginFeature you write all the strings of the feature in a defined order, preferably the same order the length array has.  This will finally give the file a sane size and allow for reading it back even faster since no scanning for NULL bytes is needed.

The fact that the binary registry is currently much larger than the XML version should ring all alarm bells; XML is already verbose (read: wasteful, bloated) because it aims to be human readable.
Comment 12 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-08 20:11:21 UTC
Mathieu, we agreed that I hack on it during Boston summit, so please don't touch the code! I'll attach new versions when the summit is over.
Comment 13 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-08 21:12:10 UTC
I also have #5 working now (by also prepending the magic and so on.
Will upload new version tomorrow.
Comment 14 Mathieu Garcia 2006-10-09 02:31:24 UTC
Stefan : OK. I won't change the code anymore until you post the new version.
Comment 15 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-09 03:19:49 UTC
Just noticed that is not yet handles pad-templates, interfaces and uritypes :(
I'll upload the new code now, so that you can check both versions and eventually merge it into one.

Comment 16 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-09 03:23:30 UTC
Created attachment 74321 [details] [review]
GStreamer Binary Registry C file (ensonic)

TODO:
* move error handling more out of main code
* variable string length
* handle pad-templates, interfaces and uritypes
Comment 17 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-09 03:24:24 UTC
Created attachment 74322 [details] [review]
GStreamer Binary Registry header file (ensonic)
Comment 18 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-09 15:29:31 UTC
step 2 would be  to keep the mmaped file alive and reference all strings directly from there. This also requires addition of a flag to objects like pluginfeature to tell them not to free the strings at destruction.
Comment 19 Mathieu Garcia 2006-10-09 18:27:29 UTC
OK. 

Found a bug within gst_registry_binary_write_cache. Current gstreamer CVS fix the problem (with gst_registry_xml_write_cache). The bug occurs when the .gstreamer-0.10 directory doesn't exists.

We should have :

 /* the previous g_mkstemp call overwrote the XXXXXX placeholder ... */
    g_free (tmp_location);
    tmp_location = g_strconcat (location, ".tmpXXXXXX", NULL);
    registry->cache_file = g_mkstemp (tmp_location);

    if (registry->cache_file == -1) {
      GST_DEBUG ("g_mkstemp() failed: %s", g_strerror (errno));
      g_free (tmp_location);
      return FALSE;

See http://webcvs.freedesktop.org/gstreamer/gstreamer/gst/gstregistryxml.c?r1=1.26&r2=1.27

Comment 20 Tim-Philipp Müller 2006-10-09 18:49:07 UTC
Something else to keep in mind: you can't assume the registry is always mmap()-able, the home directory might be on a mounted remote share of some sort for example, so you also need a read() fallback if I'm not mistaken.
Comment 21 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-20 10:45:02 UTC
Created attachment 75070 [details] [review]
GStreamer Binary Registry C file (ensonic)

adds padtemplate handling and string compresion
Comment 22 Stefan Sauer (gstreamer, gtkdoc dev) 2006-10-20 10:45:51 UTC
Created attachment 75071 [details] [review]
GStreamer Binary Registry H file (ensonic)
Comment 23 Mathieu Garcia 2006-12-13 10:44:11 UTC
Created attachment 78275 [details]
GStreamer Binary Registry C file (with align feature)

Set ALIGN32 to 1 if your target doesn't unaligned read/write.
Fixed by Josep Torra.
Comment 24 Mathieu Garcia 2006-12-13 10:45:12 UTC
Created attachment 78276 [details]
GStreamer Binary Registry header file (with align feature)
Comment 25 David Schleef 2006-12-13 21:13:00 UTC
Please choose either aligned or unaligned; don't make it a #define.  Unaligned seems slightly better, since one of the purposes of a binary registry is to save space, and we have GST_READ_*() macros.

Macros that evaulate to empty statements are not portable.  Please change it to do {} while(0).

Please change SAVE_STRING() into a function.
Comment 26 Wim Taymans 2006-12-16 16:43:57 UTC
I also thought the purpose was to not strcpy at all, just point the strings to the mmaped memory (this requires a flag in the various core struct so that the mem does not get freed). In the case of non-mmapped data, one could consider strcpy or malloc+read the whole file and reuse the mmap code, depending on which one is more memory/speed friendly.
Comment 27 Josep Torra Valles 2006-12-28 13:43:57 UTC
Refering to comment #25, the aligned feature is necessary almost for some ARM platforms.
Comment 28 Josep Torra Valles 2006-12-28 17:29:17 UTC
Created attachment 78995 [details] [review]
Add --enable-binary-registry feature and mix the source code as necessary

This patch is a replacement for the previous one
Comment 29 Josep Torra Valles 2006-12-28 17:41:17 UTC
Created attachment 78996 [details]
GStreamer Binary Registry C file

- SAVE_STRING macro converted to gst_registry_binary_save_string() function
- Removed #define ALIGN32, GST_HAVE_UNALIGNED_ACCESS is used instead
- Fixed align32() macro for portability as David suggested
- Changed the word chunck for chunk :)
Comment 30 Josep Torra Valles 2006-12-28 17:43:45 UTC
Created attachment 78997 [details]
GStreamer Binary Registry H file

Added gstconfig.h and some minor issues
Comment 31 Josep Torra Valles 2006-12-28 17:48:52 UTC
A typo in comment #28, the configure flag is --enable-binary-registry
Comment 32 Stefan Sauer (gstreamer, gtkdoc dev) 2007-01-10 14:46:39 UTC
Joseph, the changes look good. I'll test these over the next few days and commit them then.
Comment 33 Stefan Sauer (gstreamer, gtkdoc dev) 2007-01-11 13:48:19 UTC
the files are now commited. please continue work in cvs.
Comment 34 Stefan Sauer (gstreamer, gtkdoc dev) 2007-01-15 12:19:25 UTC
2007-01-15  Stefan Kost  <ensonic@users.sf.net>

	* gst/gstregistrybinary.c: (gst_registry_binary_write),
	(gst_registry_binary_initialize_magic),
	(gst_registry_binary_save_string), (gst_registry_binary_make_data),
	(gst_registry_binary_save_pad_template),
	(gst_registry_binary_save_feature),
	(gst_registry_binary_save_plugin),
	(gst_registry_binary_write_cache),
	(gst_registry_binary_check_magic),
	(gst_registry_binary_load_pad_template),
	(gst_registry_binary_load_feature),
	(gst_registry_binary_load_plugin),
	(gst_registry_binary_read_cache):
	* gst/gstregistrybinary.h:
	  use glib types, cleanup comments, impement interfaces and uri-types
Comment 35 Stefan Sauer (gstreamer, gtkdoc dev) 2007-04-27 08:08:24 UTC
2007-04-26  Stefan Kost  <ensonic@users.sf.net>

	* gst/gstregistrybinary.c: (gst_registry_binary_write_cache),
	(gst_registry_binary_load_pad_template),
	(gst_registry_binary_load_plugin),
	(gst_registry_binary_read_cache):
	* gst/gstregistrybinary.h:
	  Implement no-mmap alternative for registry reading. Do code cleanups.
	  Add more comments about avoiding strdups for all text data. Comments
	  welcome.

Regarding Comment #26, I added some details to the todo comments. Basically the questions is wheter we can turn the private 'gboolean loaded' in GstPluginFeature into a flag var and add a const flag. If that sounds good I make those changes and supply a patch.
Comment 36 Stefan Sauer (gstreamer, gtkdoc dev) 2007-08-09 07:56:28 UTC
Another idea is to use a dictionary for the strings in the file and instead of storing the string save a token-id taht references the string. Heres some data that show the potential benefit:

-rw------- 1 ensonic users 521799  8. Aug 18:51 registry.i686.bin
-rw-r--r-- 1 ensonic users  38261  9. Aug 10:52 registry.i686.bin.bz2
-rw-r--r-- 1 ensonic users  50672  9. Aug 10:44 registry.i686.bin.gz
-rw-r--r-- 1 ensonic users 483166  9. Aug 10:44 registry.i686.txt
-rw-r--r-- 1 ensonic users 188759  9. Aug 10:45 registry.i686.txt.sort.uniq
-rw------- 1 ensonic users 983413 16. Jul 20:38 registry.i686.xml

one sees that the binary registry is nearly half of the xml one. But applying gzip to it cuts it down to 9,7% (bzip2 to 7%)! Piping it through strings shows that most of the stuff is text and applying sort | uniq shows that one could easily save more that a half by using a dictionary.
That combined with the keeping the whole registry image in memory and using static references in the features will save also runtime memory.
Comment 37 David Schleef 2007-08-13 22:19:35 UTC
Don't bother using a dictionary, it's equivalent to gzip.

Better would be to get rid of all the silly text information that 1) isn't translated, and 2) not useful for any of the types of lookups we currently do.  That might need an ABI change, though.
Comment 38 Stefan Sauer (gstreamer, gtkdoc dev) 2007-08-14 05:35:19 UTC
David, yes soe test could use translations, but the author email, plugin/element website, element class and caps does not. And thats the big part.
Comment 39 Tim-Philipp Müller 2008-01-21 19:43:54 UTC
Random note: when we change the default from XML to binary registry, we need to make sure the functions gst_registry_xml_read_cache() and gst_registry_xml_write_cache() stay defined and return FALSE (rather than just not be available at all any more, which would break ABI).
Comment 40 Stefan Sauer (gstreamer, gtkdoc dev) 2008-06-13 20:37:13 UTC
Its the default now.