GNOME Bugzilla – Bug 760267
tracer: how to pass and structure tracing data (GstStructure, GVariant, etc.)
Last modified: 2016-01-20 08:42:13 UTC
+++ This bug was initially created as a clone of Bug #733187 +++ Discussion bug about gst_tracer_log_trace() and/or in what way/format to best pass trace data from the tracer. Currently a GstStructure is passed. It might be nice to have a variant that takes a vararg function so tracers don't have to do the gst_structure_new(..) dance. The other reason the vararg approach might be nice is that it gives us more options later to stuff the contents into something else or process/serialise them more efficiently. Additionally, I wonder if it'd make sense to specify explicitly what types are accepted (we seem to use GST_TYPE_STRUCTURE a lot for nesting, which seems unfortunate in this context to make it generic). GVariant would be perfect for this, only that unlike GstStructure it doesn't make fieldnames mandatory, although we could just require GVariant with key/value tuples like we do in GstStructure. Of course we could always add some other format in future if we want to. GstStructure's killer feature is that it's very free-form and easy to read/write due to the name/type/value triplets. The downside is that it's very verbose and comparatively expensive to serialize. IRC discussion from today: 11:03:23 ensonic: regarding tracer and why gst_tracer_log_trace() is using a structure right now: this allows us to deserialize the strcuture back (as done in gststats.c), I'll take another look at gvariant to see if I could use that instead 11:10:10 tim: yeah, I was mostly thinking out loud there :) 11:11:14 tim: the downside with GVariant is that it doesn't have fieldnames by default, so it's less convenient 11:11:53 tim: the upside is that it's restricted to serialisable types 11:16:05 tim: where in gststats.c is anything deserialized? 11:22:43 tim: I guess my worry is the overhead, if all the GValue/GValueTable/GType/GstValue scales well 11:30:49 ensonic: gst_structure_from_string() at line793 11:33:06 ensonic: I have a little ipc library in buzztrax that uses dbus like specs: ipc_send(channel, "iis", 5, 7, "test"); where ipc_send is a var-args func. 11:34:05 tim: ah, gst-stats.c :) 11:34:14 ensonic: there I can also add fields (at the end) 11:34:31 tim: yeah, that's basically how GVariant works as well 11:34:50 ensonic: I'll try to prototype something using gvariant 11:35:05 tim: the advantage of GstStructure is the triplet-structure 11:35:21 tim: which allows tracers to log data almost free-form, just giving names to fields 11:35:28 ensonic: jep 11:35:47 tim: with the dbus/gvariant style you either need to know what each field means or you require field names to be added everywhere 11:35:48 ensonic: and the apps can read the tracer spec (also structure) to understand it 11:36:33 ensonic: for gvariant we can also log a spec first and the apps can parse the spec and then know what the gvariant logs mean 11:36:40 tim: right 11:36:47 ensonic: so we only need to log e.g. the field names once 11:36:53 tim: the amount of data you have to serialiase and send is much less then 11:37:03 tim: -a 11:37:12 ensonic: yeps 11:38:08 ensonic: let me try a few things, but lunch first :)
After the above discussion with Stefan, I'm now leaning towards something more GVariant like for the trace data collection, if that's feasible. It could be based on GVariant internally, but wouldn't have to. Tracers can then label the fields separately in a header that only needs to be sent once (although this may not be entirely trivial if there are conditional fields and such). The advantage of this is much lower overhead both inside the tracers and in terms of serialisation data going over the wire, thus it'd scale much better and have less impact on the normal operation of the pipeline. Thoughts?
I've tried to use GVariant. The logging could be something like: void gst_tracer_log_trace (const gchar *format, ...) { va_list var_args; GVariant *v; gchar *s; va_start (var_args, format); v = g_variant_new_va (format, NULLL, var_args); s = g_variant_print (v, FALSE); GST_TRACE ("%s", s); g_free (s); g_variant_unref (v); va_end (var_args); } The issue is that GVariant is only providing very few data types. Right now with the GstStructure we can serialize GEnums ("pad-direction=(GstPadDirection)GST_PAD_SINK"), GstFractions and many more. Also avoiding the intermediate GVariants would be nice. Another option could be: void gst_tracer_log_trace (GType *types, guint n_types, ...) { /* iterate over types and serialize them using GStValue. */ } But if that hits a structure as a child (e.g. a message) it would serialize this as we know it (with fields). Anyway, this would only log the data. And we could log the types with the initial spec.
Question is how essential things like these are. One could always serialise enums as ints and declare the enum type elsewhere alongside the field name. Fractions could also just be sent as two ints, none of the existing tracers uses fractions as far as I can tell, and I suspect fractions would mostly be sent as part of caps, which would be serialised in form of the caps string anyway. We could also just make up our own thing that works similar to GVariant (i.e. with a format string) but a simplified version of it that uses GTypes, but only those we explicitly want to support. If we make our own version we could also require fieldnames again, but those seem like an awful lot of overhead to send over the wire in the log everytime (maybe we should send a cumulatively built quark table over the wire too ;))
Design doc: https://docs.google.com/document/d/1EVBiIkMpJyknHW4ZrYhJD6DK0sgwT8FcQ_bSlpzEGRo/edit?usp=sharing
Look at commit bb40a184137d7f9d918a2929074ec52210cab401 Author: Stefan Sauer <ensonic@users.sf.net> Date: Fri Jan 8 23:06:55 2016 +0100 benchmark: add a benchmark for bgo/760267 Big suprise - GstStructure is faster than GVariant. ./tracerserialize 0:00:00.353588265: GstStructure 0:00:00.464734311: GVariant GST_DEBUG="default:7" GST_DEBUG_FILE=trace.log ./tracerserialize 0:00:01.452568709: GstStructure 0:00:01.615803751: GVariant grep "log_gst_structure" trace.log >tracerserialize.gststructure.log grep "log_g_variant" trace.log >tracerserialize.gvariant.log ls -l tracerserialize.*.log -rw-r----- 1 ensonic eng 23400000 Jan 8 23:10 tracerserialize.gststructure.log -rw-r----- 1 ensonic eng 16600000 Jan 8 23:11 tracerserialize.gvariant.log
That's hilarious, admittedly not what I expected (there must be something very silly going on in GVariant), but ultimately not important :) The point I was trying to make however is not just GVariant vs. GstStructure, but that we should create an API where we can make the implementation fast if we want to or need to, by doing something custom there. Whether we internally use GVariant/GstStructure for starters doesn't matter so much to me. If we expose GVariant/GstStructure in the API we don't have that option any more. I think both the GstStructure-style triplet vararg API or the GVariant-style format + args vararg API would be fine, as long as we restrict the types that can be passed to it. The other consideration is output size. The GVariant output (ASCII or binary) is about a third of the serialized GstStructure string. In order to make this scale and work well with non-trivial pipelines over the network or on embedded devices, we will have to make sure we output as little as possible. This can still be changed later though, and doesn't indicate which API to go for either.
Did you look at my doc? We can make the GstStructure part even faster. Also your proposal to take vararg to not hardcode the structure, is not going to fly (sorry): gst_tracer_log("name", "field1", type1, val1, "field2", type2, val2, ...) still implies structure. Anyway what I proposed in the doc could actually get us there. Here is the essence: We add a GstTracerRecord (object/miniobject). In the tracer plugin init we create new GstTracerRecord instances by passing the large nested GstStructure that described the log format. This will be extended to also list the types. Now gst_tracer_record_new(GstStructure *spec) will generate a printf style format string from the spec. That is we do a variant of gst_structure_to_string that instead of writing the values write the format placeholder (e.g. %u for an G_TYPE_UINT and % + GST_PTR_FMT for Caps/Structure/…). Based on that we can provide gst_tracer_record_log(GstTracerRecord *self, …); that will only that the values. Also for the size math. The gvariant output is not a third, it is 2/3 of the structure output.
> Did you look at my doc? I did, but when I first looked it was just a list of pros and cons IIRC - better to do this in bugzilla :) > We can make the GstStructure part even faster. Not sure what you mean here by "GstStructure part" exactly (if you're talking about serialization, see bug #760027 too which adds GString API, but current GLib implementation makes that useless). > Also your proposal to take vararg to not hardcode the structure, is not > going to fly (sorry): > gst_tracer_log("name", "field1", type1, val1, "field2", type2, val2, ...) > still implies structure. Not sure I understand. What do you mean by "this implies structure"? (Is structure = GstStructure?) If you only allowed types like G_TYPE_INT/STRING/FRACTION etc. we can read the values as we like. For the spec/class part I don't care about whether it's GstStructure-based or not. > Anyway what I proposed in the doc could actually > get us there. > > Here is the essence: > We add a GstTracerRecord (object/miniobject). In the tracer plugin init we > create new GstTracerRecord instances by passing the large nested > GstStructure that described the log format. This will be extended to also > list the types. Now gst_tracer_record_new(GstStructure *spec) will generate > a printf style format string from the spec. That is we do a variant of > gst_structure_to_string that instead of writing the values write the format > placeholder (e.g. %u for an G_TYPE_UINT and % + GST_PTR_FMT for > Caps/Structure/…). Based on that we can provide > gst_tracer_record_log(GstTracerRecord *self, …); that will only that the > values. Not sure I understand fully, or why we need a 'printf format string' here, but it sounds to me like this is going in the right direction. I think you may be a bit too focused on how/where to serialise the data to. For me, this is completely secondary, and I don't see us outputting to the debug log longer-term anyway. > Also for the size math. The gvariant output is not a third, it is 2/3 of the > structure output. gst_structure_to_string: 91: name, ... g_variant_print (v, FALSE): 27: ('name', 0, 10, 'hallo', 1) g_variant_print (v, TRUE): 48: ('name', uint64 0, uint32 10, 'hallo', uint32 1) (not entirely fair of course, because the field names are missing, just an indication that we need to do better than gst_structure_to_string, which we surely can.)
Created attachment 318898 [details] [review] first stab at adding a new tracer record class This will change the records from 0:00:00.352959142 27402 0xafaed0 TRACE GST_TRACER gsttracer.c:214:gst_tracer_log_trace: latency, src=(string)audiotestsrc0_src, sink=(string)autoaudiosink0-actual-sink-pulse_sink, time=(guint64)68860; to 0:00:00.352940480 27402 0xafaed0 TRACE GST_TRACER :0:: latency, src=(string)audiotestsrc0_src, sink=(string)autoaudiosink0-actual-sink-pulse_sink, time=(guint64)68860; So, we're essentially dropping the filename+func+line (that wan't useful so far). Please note, that the patch is not yet complete. If it looks good, I will remove the previous gst_tracer_log_trace function and also finish porting the existing tracer. The tracerserialize benchmark update shows: 0:00:00.301906352: GstStructure 0:00:00.000385220: GstStructure template 0:00:00.456386346: GVariant
This approach looks great to me. It lets us change things under the hood later (e.g. we could make it create a GType array and read the varargs differently later if we wanted to, and serialize the same data differently later if we wanted to, without any tracer code having to be changed). It also looks quite a bit faster :) Regarding the GstTracerRecord (I know it's just a first prototype): I wonder if we really need to make it a full GObject. I'd just make it an opaque structure the details of which are internal to GStreamer. I don't foresee anyone needing to use it or poke at it, do you?
Thanks for the encouraging comments. I thought about making t a mini object too. But I don't expect many instances of those and in addition, we might want to make them part of the plugin feature, so that e.g. gst-inspect can list what data a tracer logs or apps can query this from the registry. This is not very concrete - I know. Another benefit about it being a GObject though is that it is binding friendly and we can use the property. I could try to see if I can change it into a mini object.
Ok, if you see a possible future use for it, sure. Not really important, I was mostly wondering about it with regard to allowing subclassing and such (since the structs are in the public header).
Created attachment 318989 [details] [review] updated patch, now all tracers are ported
Created attachment 319001 [details] [review] updated patch, now all tracers are ported A few things to discuss: - booleans and enums With the GstStructure we serialize enums as "strings" and also booleans as stings. I will test if we can deserialize them also from numbers, otherwise we probably need some helpers to serialize the in the expected form. - optional values The log entries need to be static if we use a format strings like this. We could make this more fancier behind the scenes by having a piecewise format string (snippets for each value) and mark optional fields in the spec (a flag instead of an extra field). When when logging we assemble the format string depending on the presence of optional fields (we can take a shortcut for messages that don't have this). And here is a benchmark run with actually logging enabled: 0:00:00.946081989: GstStructure 0:00:00.568562562: GstStructure template 0:00:01.073995872: GVariant Seems that half the time is in writing the log.
Cool, I think we should get this in and iterate from there. I wonder if it'd make sense to hard-code the list of GTypes that we support for GST_PTR_FORMAT printf format, instead of just making that the else branch for all unsupported types. Do you foresee the need for anyone to subclass GstTraceRecord? If not, maybe we should move the structure definition into the .c file?
Created attachment 319058 [details] [review] add trace record class I've added docs and test. I'll also moved the structs and dropped the priv struct. I'll take another look at the GTypes list. I will add the once that I covered in the tests to the list and warn in the else case. That is safer.
I'll figure something to make the optional field handling nice in another patch. Please note, that the log overhead in terms of size is smaller now, since I log enums/booleans as ints. commit 10b78d872b5a0050c8a4572267879b0adbb578f8 Author: Stefan Sauer <ensonic@users.sf.net> Date: Tue Jan 12 14:59:04 2016 +0100 tracerrecord: add a log record class We use this class to register tracer log entry metadata and build a log template. With the log template we can serialize log data very efficiently. This also simplifies the logging code, since that is now a simple varargs function that is not exposing the implementation details. Add docs for the new class and basic tests. Remove the previous log handler. Fixes #760267
Meh, absolute show-stopper :/ When serializing a structure, the fields are serialized using gst_value_serialize() and this escaped substrings using gst_string_wrap(). It I build the format string and rely on GST_PTR_FORMAT for e.g. nested structures, this is not the case :/ We log the structures contained in messages/queries. My first thought was to fixup the string from the tracer (call gst_structure_to_string() there), but that won't work - we would need to make the escaping code public and the gtype would then be string and not structure :/ Maybe I can iterate the var_vars in the log handler and handle this there. Any better ideas? That comes from optimizing code that works :/
Basically what we need is to tell gst_info_printf_pointer_extension_func() to run the resulting string through gst_string_wrap(). We could register additional formats (like lower case 'a' instead of 'A' for GST_WRAPPED_PTR_FORMAT). Any better idea?
A (private) custom printf format should work I guess.
Created attachment 319303 [details] [review] apply string wrapping to some data types in tracer serialisation