GNOME Bugzilla – Bug 359653
GStreamer Binary Registry patch
Last modified: 2008-06-14 11:13:29 UTC
Hello, Here's a basic implementation of the binary registry. There might be some bugs. First, copy gstregistrybinary.c and gstregistrybinary.h into the gst/ directory, then apply the patch. Recompile and install. The libxml2 dependancy was not removed. By default w/ this patch, gstreamer will use the binary registry.
Created attachment 74015 [details] [review] GStreamer Binary Registry patch
Created attachment 74016 [details] GStreamer Binary Registry C file
Created attachment 74017 [details] GStreamer Binary Registry header file
seems to work great. sysprof gives good speedup figures xml-registry: 26.70 bin-registry: 3.36 I wonder why the binary registry is around 5 times bigger. 2567204 2006-10-06 18:42 registry.i686.bin 529225 2006-10-06 17:47 registry.i686.xml Mathieu, could you perhaps documents all of the struct members like (GstBinaryPluginElement.m32p). I'll now add some tracing to tailor the GST_BINARY_REGISTRY_XXX_LEN defines.
some more comments: #1: please add: #define GST_CAT_DEFAULT GST_CAT_REGISTRY to gstregistrybinary.c #2: wouldn't it make more sense to use pascal style strings (16bit length, followed by data). Thats the current stringspace usage. name 17 / 256 description 289 / 1024 filename 61 / 256 version 9 / 64 license 7 / 256 source 21 / 256 package 38 / 1024 origin 50 / 1024 #3: use gtk-doc style comments (but start local docs with /* instead of /** #4: use malloc not calloc if memory gets initialized anyway #5: in gst_registry_binary_write_cache() you walk registry->plugins backwards and use g_list_append in gst_registry_binary_save_plugin(). Can't we reverse that? #6: gst_registry_binary_get_binary_plugin() looks leaky / broken. At first you once do g_free(plugin); and once gst_object_unref(plugin);. Next you leak all strings on all those return -1
erm another one. Could you use a #define to conditionally use one or the other registry implementation?. Alternatively, what about making GstRegistryXML and GstrRegistryBinary subclasses of GstRegistry and then turn gst_registry_read_cache gst_registry_write_cache gst_registry_file_extension virtual methods.
I purposely got rid of the registry subclasses, because it's a *cache*, not a registry. Thus, if the cache is not compatible with the current library (say the new library uses a binary registry), the new library will simply regenerate it. There's no need to have 2 registry implementations at runtime. Having two registry implementations in the source is no problem, but I expect that we'll switch over to binary registries and never look back once the kinks are worked out.
Created attachment 74278 [details] GStreamer Binary Registry C file
Created attachment 74279 [details] GStreamer Binary Registry header file
Thank you for your comments. I've made some modifications according to your suggestions. #1 : "#define GST_CAT_DEFAULT GST_CAT_REGISTRY" was added in the C file #2 : Using pascal strings is a good idea but gcc doesn't recommend it, please see http://developer.apple.com/documentation/DeveloperTools/gcc-4.0.1/gcc/Pascal-Strings.html for further comments. So, as a workaround, I added a define "GST_BINARY_REGISTRY_SIZE_FACTOR" (set to 4 by default), to divide the lengths (GST_BINARY_REGISTRY_NAME_LEN, etc.). Some sanity check should be added in case we reach 0 byte length. So you can change this define to fit your needs. #3 : I'll check about gtk-doc #4 : Changed all calloc's to malloc's, as memory is always initialized. Should speed-up the process a little bit. #5 : Actually if we use a prepend operator, we'll corrupt the header, etc. I didn't check much how to solve it but I'll do it soon. #6 : Fixed leaks + changed g_free's to gst_object_unref's Regarding the implementation of subclasses, I prefer to wait until we find a solution that would fit everyone's needs. Any more opens ?
Note that Stefan meant pascal *style* strings. Real pascal strings that are backed by the compiler don't work anyways since they have a maximum length of 255. You should most probably change GstBinaryPluginFeature to remove the string fields and add an array for the string lengths instead. Directly after writing out the storage representation of a GstBinaryPluginFeature you write all the strings of the feature in a defined order, preferably the same order the length array has. This will finally give the file a sane size and allow for reading it back even faster since no scanning for NULL bytes is needed. The fact that the binary registry is currently much larger than the XML version should ring all alarm bells; XML is already verbose (read: wasteful, bloated) because it aims to be human readable.
Mathieu, we agreed that I hack on it during Boston summit, so please don't touch the code! I'll attach new versions when the summit is over.
I also have #5 working now (by also prepending the magic and so on. Will upload new version tomorrow.
Stefan : OK. I won't change the code anymore until you post the new version.
Just noticed that is not yet handles pad-templates, interfaces and uritypes :( I'll upload the new code now, so that you can check both versions and eventually merge it into one.
Created attachment 74321 [details] [review] GStreamer Binary Registry C file (ensonic) TODO: * move error handling more out of main code * variable string length * handle pad-templates, interfaces and uritypes
Created attachment 74322 [details] [review] GStreamer Binary Registry header file (ensonic)
step 2 would be to keep the mmaped file alive and reference all strings directly from there. This also requires addition of a flag to objects like pluginfeature to tell them not to free the strings at destruction.
OK. Found a bug within gst_registry_binary_write_cache. Current gstreamer CVS fix the problem (with gst_registry_xml_write_cache). The bug occurs when the .gstreamer-0.10 directory doesn't exists. We should have : /* the previous g_mkstemp call overwrote the XXXXXX placeholder ... */ g_free (tmp_location); tmp_location = g_strconcat (location, ".tmpXXXXXX", NULL); registry->cache_file = g_mkstemp (tmp_location); if (registry->cache_file == -1) { GST_DEBUG ("g_mkstemp() failed: %s", g_strerror (errno)); g_free (tmp_location); return FALSE; See http://webcvs.freedesktop.org/gstreamer/gstreamer/gst/gstregistryxml.c?r1=1.26&r2=1.27
Something else to keep in mind: you can't assume the registry is always mmap()-able, the home directory might be on a mounted remote share of some sort for example, so you also need a read() fallback if I'm not mistaken.
Created attachment 75070 [details] [review] GStreamer Binary Registry C file (ensonic) adds padtemplate handling and string compresion
Created attachment 75071 [details] [review] GStreamer Binary Registry H file (ensonic)
Created attachment 78275 [details] GStreamer Binary Registry C file (with align feature) Set ALIGN32 to 1 if your target doesn't unaligned read/write. Fixed by Josep Torra.
Created attachment 78276 [details] GStreamer Binary Registry header file (with align feature)
Please choose either aligned or unaligned; don't make it a #define. Unaligned seems slightly better, since one of the purposes of a binary registry is to save space, and we have GST_READ_*() macros. Macros that evaulate to empty statements are not portable. Please change it to do {} while(0). Please change SAVE_STRING() into a function.
I also thought the purpose was to not strcpy at all, just point the strings to the mmaped memory (this requires a flag in the various core struct so that the mem does not get freed). In the case of non-mmapped data, one could consider strcpy or malloc+read the whole file and reuse the mmap code, depending on which one is more memory/speed friendly.
Refering to comment #25, the aligned feature is necessary almost for some ARM platforms.
Created attachment 78995 [details] [review] Add --enable-binary-registry feature and mix the source code as necessary This patch is a replacement for the previous one
Created attachment 78996 [details] GStreamer Binary Registry C file - SAVE_STRING macro converted to gst_registry_binary_save_string() function - Removed #define ALIGN32, GST_HAVE_UNALIGNED_ACCESS is used instead - Fixed align32() macro for portability as David suggested - Changed the word chunck for chunk :)
Created attachment 78997 [details] GStreamer Binary Registry H file Added gstconfig.h and some minor issues
A typo in comment #28, the configure flag is --enable-binary-registry
Joseph, the changes look good. I'll test these over the next few days and commit them then.
the files are now commited. please continue work in cvs.
2007-01-15 Stefan Kost <ensonic@users.sf.net> * gst/gstregistrybinary.c: (gst_registry_binary_write), (gst_registry_binary_initialize_magic), (gst_registry_binary_save_string), (gst_registry_binary_make_data), (gst_registry_binary_save_pad_template), (gst_registry_binary_save_feature), (gst_registry_binary_save_plugin), (gst_registry_binary_write_cache), (gst_registry_binary_check_magic), (gst_registry_binary_load_pad_template), (gst_registry_binary_load_feature), (gst_registry_binary_load_plugin), (gst_registry_binary_read_cache): * gst/gstregistrybinary.h: use glib types, cleanup comments, impement interfaces and uri-types
2007-04-26 Stefan Kost <ensonic@users.sf.net> * gst/gstregistrybinary.c: (gst_registry_binary_write_cache), (gst_registry_binary_load_pad_template), (gst_registry_binary_load_plugin), (gst_registry_binary_read_cache): * gst/gstregistrybinary.h: Implement no-mmap alternative for registry reading. Do code cleanups. Add more comments about avoiding strdups for all text data. Comments welcome. Regarding Comment #26, I added some details to the todo comments. Basically the questions is wheter we can turn the private 'gboolean loaded' in GstPluginFeature into a flag var and add a const flag. If that sounds good I make those changes and supply a patch.
Another idea is to use a dictionary for the strings in the file and instead of storing the string save a token-id taht references the string. Heres some data that show the potential benefit: -rw------- 1 ensonic users 521799 8. Aug 18:51 registry.i686.bin -rw-r--r-- 1 ensonic users 38261 9. Aug 10:52 registry.i686.bin.bz2 -rw-r--r-- 1 ensonic users 50672 9. Aug 10:44 registry.i686.bin.gz -rw-r--r-- 1 ensonic users 483166 9. Aug 10:44 registry.i686.txt -rw-r--r-- 1 ensonic users 188759 9. Aug 10:45 registry.i686.txt.sort.uniq -rw------- 1 ensonic users 983413 16. Jul 20:38 registry.i686.xml one sees that the binary registry is nearly half of the xml one. But applying gzip to it cuts it down to 9,7% (bzip2 to 7%)! Piping it through strings shows that most of the stuff is text and applying sort | uniq shows that one could easily save more that a half by using a dictionary. That combined with the keeping the whole registry image in memory and using static references in the features will save also runtime memory.
Don't bother using a dictionary, it's equivalent to gzip. Better would be to get rid of all the silly text information that 1) isn't translated, and 2) not useful for any of the types of lookups we currently do. That might need an ABI change, though.
David, yes soe test could use translations, but the author email, plugin/element website, element class and caps does not. And thats the big part.
Random note: when we change the default from XML to binary registry, we need to make sure the functions gst_registry_xml_read_cache() and gst_registry_xml_write_cache() stay defined and return FALSE (rather than just not be available at all any more, which would break ABI).
Its the default now.