GNOME Bugzilla – Bug 772936
tracer: Allow outputting logs (in particular tracer logs) in Common Trace Format (ctf)
Last modified: 2018-11-03 12:37:03 UTC
Currently we are going through the standard GStreamer logging subsystem to output the tracer logs, but it has several issues: * it has not been designed to output structured data * it is a plain text format which is not optimal at all in term of speed to write as well as size of the output * it is not a standardized format and we need to write our own tools to analyze the traces * we currently use a lock around each call to the debug function which is suboptimal Starting from that I think many of us agree (from conversations we had during the hackfest) that a nice way to solve those issues would be to allow outputting our traces as Common Trace Format[0]. This format is used by the lttng[1] project and is the main tracing format outputted by the Linux kernel. Many tools are available from the LTTNG project and we can use the babeltrace[2] library to simplify CTF trace generation (the important part of the library is licenced under MIT[3] so we should not have any problem using it in GStreamer core ifaiu). Also during the hackfest we talked about the possibility to use that same format to ouput our debug logs, but that work will be part of another bug. This feature can currently be activated through the GST_DEBUG_FORMAT=ctf,directory=/some/dir environment variable, and this same variable should be used for any format (still supporting the GST_DEBUG_FILE var too). I am attaching here the first iteration of the implementation. [0] http://diamon.org/ctf/ [1] http://lttng.org/ [2] http://diamon.org/babeltrace/ [3] https://github.com/efficios/babeltrace/blob/master/LICENSE
Created attachment 337718 [details] [review] tracerrecord: Make subclassable Making the Class public and adding a Private structure to the GstTracerRecord objects.
Created attachment 337719 [details] [review] tracer: Add support for outputing traces in Common Trace Format Implement a GstTracerCtfRecord class, subclassing the GstTracerRecord class. Activate it when the GST_TRACE_FORMAT is 'ctf'. This patches adds the following virtual methods to the GstTracerRecord class: * gboolean (*build_format) (GstTracerRecord * self, GstStructure *structure); * void (*log) (GstTracerRecord * self, va_list var_args);
Created attachment 337720 [details] [review] tracerrecord: Add "sources" for all records In the CTF format we should have a 'Stream' per source, basically each GstTracer will have its own 'stream' in the resulting traces. This breaks the API so this commits also fixes the usage all around the place.
Created attachment 337721 [details] [review] tracerctfrecord: Flush the streams every 1000 events This is the usual way to use the babeltrace API, it allows to do the flushing atomically but avoid using too much memory at runtime (also CTF events should be *very* lightweight).
Created attachment 337722 [details] [review] tracerrecord: Rationalize environment variables Use one single environment variable to define the output format: GST_DEBUG_FORMAT, you can pass extra arguments in the variable, and for example you will need to define where to output the CTF logs in the form: `ctf,dir=/dir/where/to/output/ctf/traces/`. Also added support to set the equivalent of GST_DEBUG_FILE doing `GST_DEBUG_FORMAT=gst,file=debug.log`
Created attachment 337925 [details] [review] tracerrecord: Make subclassable Making the Class public and adding a Private structure to the GstTracerRecord objects.
Created attachment 337926 [details] [review] tracer: Add support for outputing traces in Common Trace Format Implement a GstTracerCtfRecord class, subclassing the GstTracerRecord class. Activate it when the GST_TRACE_FORMAT is 'ctf'. This patches adds the following virtual methods to the GstTracerRecord class: * gboolean (*build_format) (GstTracerRecord * self, GstStructure *structure); * void (*log) (GstTracerRecord * self, va_list var_args);
Created attachment 337927 [details] [review] tracerrecord: Add "sources" for all records In the CTF format we should have a 'Stream' per source, basically each GstTracer will have its own 'stream' in the resulting traces. This breaks the API so this commits also fixes the usage all around the place.
Created attachment 337928 [details] [review] tracerctfrecord: Flush the streams every 1000 events This is the usual way to use the babeltrace API, it allows to do the flushing atomically but avoid using too much memory at runtime (also CTF events should be *very* lightweight).
Created attachment 337929 [details] [review] tracerrecord: Rationalize environment variables Use one single environment variable to define the output format: GST_DEBUG_FORMAT, you can pass extra arguments in the variable, and for example you will need to define where to output the CTF logs in the form: `ctf,dir=/dir/where/to/output/ctf/traces/`. Also added support to set the equivalent of GST_DEBUG_FILE doing `GST_DEBUG_FORMAT=gst,file=debug.log`
My goals of outputting our traces and logs in CTF where (somehow ordered by importance): * Output traces with minimal overhead * Enable reusing existing tools to mine the traces * Have smaller trace files size * Allow mining to be fast I tested and benchmarked 2 implementations using the following tools: * babel[0], it was designed mostly to do trace format conversion, and thus is not MT safe, meaning I needed a dedicated thread to work with it. Its API fits pretty well in our current design. * lttng-ust[1] which is more linked to the linux kernel and is not very portable at this time (works on mac afaict, no windows support). Also the API does not fit well at all with our current design. The benchmarking consists on outputting the GStreamer debug logs using those format and compare the performances with our current implementation, and when no debug is enabled. An example result for the following pipeline: The `stats.` tests are running the `stats` tracer and checking the logs for it. Results with audiotestsrc num-buffers=500000 ! fakesink: ╒═════════════════════════════════════════════════════╤══════════╤════════════╕ │ Test name │ size │ duration │ ╞═════════════════════════════════════════════════════╪══════════╪════════════╡ │ stats.audiotestsrc.1_runs.500000_buffers.gstdebug │ 200.0MiB │ 28.335s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ stats.audiotestsrc.1_runs.500000_buffers.babeltrace │ 39.9MiB │ 24.661s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ stats.audiotestsrc.1_runs.500000_buffers.no_logs │ 0.0B │ 23.396s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ audiotestsrc.1_runs.500000_buffers.babeltrace │ 1.7GiB │ 156.701s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ audiotestsrc.1_runs.500000_buffers.gstdebug │ 2.1GiB │ 62.773s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ audiotestsrc.1_runs.500000_buffers.no_logs │ 0.0B │ 23.183s │ ├─────────────────────────────────────────────────────┼──────────┼────────────┤ │ audiotestsrc.1_runs.500000_buffers.ust │ 1.7GiB │ 45.460s │ ╘═════════════════════════════════════════════════════╧══════════╧════════════╛ We can see that for tracer logs (properly formatted) babeltrace is being more efficient, in term of speed and disk usage (overhead is actually pretty low). For debug logs, it is being much slower for still unknown reasons. I think using babeltrace makes quite some sense for tracer logs, but we will need to check why it is so inefficient for standard debug logs. I have done some more benchmarking in other conditions and results look quite coherent with that small example. [0] diamon.org/babel/ [1] http://lttng.org/man/3/lttng-ust/v2.7/
I would really like to avoid adding this as another core dependency - would it be hard to write code for this ourselves? Is it complicated?
Well, it is not *that* complex, but it is an optional dependency and needed only for developers as tracing is really a developer thing so I am not sure it is a problem at all (we talked about it with Sebastien and Edward and they seem to agree with that). Still, I could implement it myself if people really think it is the right thing to do.
It means we can make sure it's both portable and thread-safe at least :)
Yeah, that is pretty true, I will see what I do, I think current implementation is not bad either. I will stop updating the patches and instead link to my git branch: https://github.com/thiblahute/gstreamer/tree/ctf
(In reply to Tim-Philipp Müller from comment #14) > It means we can make sure it's both portable and thread-safe at least :) Well, thinking about it, the way I would make it thread safe in our own writer implementation is pretty similar to what I am doing using babeltrace, ie. using a lock-less queue and a dedicated thread to dequeue and write the logs, which means we are not needing a lock for each write, but only while flushing. I am not convinced it is worth re-implementing the writer tbh.
What do you think we should do with that? I was thinking that if you really think we should implement CTF in GStreamer it would make sense to do so in rust, it is probably a good case for starting pushing rust in our code base and this is all for developers only and optional.
I don't know really. I very strongly dislike adding this as a new dependency to libgstreamer-1.0.so. Perhaps we can make the output/writer part pluggable as well and then it will be a trace plugin that has this dependency? I think the Rust angle is a bit premature and I don't know if it's technically really warranted here, I think the case is stronger for processing untrusted input than for outputting trusted data :) Either way am not convinced GStreamer core is where we should start introducing Rust.
> I very strongly dislike adding this as a new dependency to libgstreamer-1.0.so. My main question is why? I do not really understand what is the problem of having it as an optional dependency? This is really for developers and it should never be set as a dep for packages, also I started writing a meson wrap for babeltrace so that we could have it built as part of gst-build. I will think about how we could have it as a plugin.
Has no one have mention it, GstShark made a minimal CTF writer, it's not great, but I believe did the job for them, avoiding external deps. https://github.com/RidgeRun/gst-shark/blob/master/plugins/tracers/gstctf.c My overall interest in CTF is to have a dense format for streaming tracers over the network or to another process. Generally speaking, not much possibilities is being offered here since the CTF code writes to back log buffer which get flushed to disk. I need something different really, something more flexible.
Review of attachment 337925 [details] [review]: ::: gst/gsttracerrecord.c @@ +216,2 @@ self = g_object_newv (GST_TYPE_TRACER_RECORD, 0, NULL); + self->priv->spec = structure; The spec is just never used outside of build-format. I would make a pre-patch that drops it's storage, or make use/expose it.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/194.