Bug 772936 – tracer: Allow outputting logs (in particular tracer logs) in Common Trace Format (ctf)

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 772936 - tracer: Allow outputting logs (in particular tracer logs) in Common Trace Format (ctf)


Summary:	tracer: Allow outputting logs (in particular tracer logs) in Common Trace For...


Status:	RESOLVED OBSOLETE

Product:	GStreamer
Classification:	Platform
Component:	gstreamer (core)
Version:	git master
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	git master
Assigned To:	GStreamer Maintainers
QA Contact:	GStreamer Maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2016-10-14 14:19 UTC by Thibault Saunier
Modified:	2018-11-03 12:37 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
tracerrecord: Make subclassable (4.14 KB, patch) 2016-10-14 14:20 UTC, Thibault Saunier	none	Details \| Review
tracer: Add support for outputing traces in Common Trace Format (28.23 KB, patch) 2016-10-14 14:20 UTC, Thibault Saunier	none	Details \| Review
tracerrecord: Add "sources" for all records (18.08 KB, patch) 2016-10-14 14:20 UTC, Thibault Saunier	none	Details \| Review
tracerctfrecord: Flush the streams every 1000 events (4.81 KB, patch) 2016-10-14 14:20 UTC, Thibault Saunier	none	Details \| Review
tracerrecord: Rationalize environment variables (14.33 KB, patch) 2016-10-14 14:20 UTC, Thibault Saunier	none	Details \| Review
tracerrecord: Make subclassable (4.14 KB, patch) 2016-10-18 10:21 UTC, Thibault Saunier	none	Details \| Review
tracer: Add support for outputing traces in Common Trace Format (28.47 KB, patch) 2016-10-18 10:21 UTC, Thibault Saunier	none	Details \| Review
tracerrecord: Add "sources" for all records (18.08 KB, patch) 2016-10-18 10:21 UTC, Thibault Saunier	none	Details \| Review
tracerctfrecord: Flush the streams every 1000 events (4.81 KB, patch) 2016-10-18 10:21 UTC, Thibault Saunier	none	Details \| Review
tracerrecord: Rationalize environment variables (16.77 KB, patch) 2016-10-18 10:21 UTC, Thibault Saunier	none	Details \| Review

Description Thibault Saunier 2016-10-14 14:19:29 UTC

Currently we are going through the standard GStreamer logging subsystem to output the tracer logs, but it has several issues:
  
  * it has not been designed to output structured data
  * it is a plain text format which is not optimal at all in term of speed to write as well as size of the output
  * it is not a standardized format and we need to write our own tools to analyze the traces
  * we currently use a lock around each call to the debug function which is suboptimal


Starting from that I think many of us agree (from conversations we had during the hackfest) that a nice way to solve those issues would be to allow outputting our traces as Common Trace Format[0]. This format is used by the lttng[1] project and is the main tracing format outputted by the Linux kernel. Many tools are available from the LTTNG project and we can use the babeltrace[2] library to simplify CTF trace generation (the important part of the library is licenced under MIT[3] so we should not have any problem using it in GStreamer core ifaiu).

Also during the hackfest we talked about the possibility to use that same format to ouput our debug logs, but that work will be part of another bug.

This feature can currently be activated through the GST_DEBUG_FORMAT=ctf,directory=/some/dir environment variable, and this same variable should be used for any format (still supporting the GST_DEBUG_FILE var too).


I am attaching here the first iteration of the implementation.


[0] http://diamon.org/ctf/
[1] http://lttng.org/
[2] http://diamon.org/babeltrace/
[3] https://github.com/efficios/babeltrace/blob/master/LICENSE

Comment 1 Thibault Saunier 2016-10-14 14:20:24 UTC

Created attachment 337718 [details] [review]
tracerrecord: Make subclassable

Making the Class public and adding a Private structure
to the GstTracerRecord objects.

Comment 2 Thibault Saunier 2016-10-14 14:20:31 UTC

Created attachment 337719 [details] [review]
tracer: Add support for outputing traces in Common Trace Format

Implement a GstTracerCtfRecord class, subclassing the GstTracerRecord
class. Activate it when the GST_TRACE_FORMAT is 'ctf'.

This patches adds the following virtual methods to the GstTracerRecord
class:

  *  gboolean (*build_format) (GstTracerRecord * self, GstStructure *structure);
  *  void (*log) (GstTracerRecord * self, va_list var_args);

Comment 3 Thibault Saunier 2016-10-14 14:20:37 UTC

Created attachment 337720 [details] [review]
tracerrecord: Add "sources" for all records

In the CTF format we should have a 'Stream' per source,
basically each GstTracer will have its own 'stream' in the
resulting traces.

This breaks the API so this commits also fixes the usage all around
the place.

Comment 4 Thibault Saunier 2016-10-14 14:20:43 UTC

Created attachment 337721 [details] [review]
tracerctfrecord: Flush the streams every 1000 events

This is the usual way to use the babeltrace API, it allows to do
the flushing atomically but avoid using too much memory at runtime
(also CTF events should be *very* lightweight).

Comment 5 Thibault Saunier 2016-10-14 14:20:49 UTC

Created attachment 337722 [details] [review]
tracerrecord: Rationalize environment variables

Use one single environment variable to define the output format:
GST_DEBUG_FORMAT, you can pass extra arguments in the variable,
and for example you will need to define where to output the CTF
logs in the form: `ctf,dir=/dir/where/to/output/ctf/traces/`.

Also added support to set the equivalent of GST_DEBUG_FILE
doing `GST_DEBUG_FORMAT=gst,file=debug.log`

Comment 6 Thibault Saunier 2016-10-18 10:21:26 UTC

Created attachment 337925 [details] [review]
tracerrecord: Make subclassable

Making the Class public and adding a Private structure
to the GstTracerRecord objects.

Comment 7 Thibault Saunier 2016-10-18 10:21:32 UTC

Created attachment 337926 [details] [review]
tracer: Add support for outputing traces in Common Trace Format

Implement a GstTracerCtfRecord class, subclassing the GstTracerRecord
class. Activate it when the GST_TRACE_FORMAT is 'ctf'.

This patches adds the following virtual methods to the GstTracerRecord
class:

  *  gboolean (*build_format) (GstTracerRecord * self, GstStructure *structure);
  *  void (*log) (GstTracerRecord * self, va_list var_args);

Comment 8 Thibault Saunier 2016-10-18 10:21:38 UTC

Created attachment 337927 [details] [review]
tracerrecord: Add "sources" for all records

In the CTF format we should have a 'Stream' per source,
basically each GstTracer will have its own 'stream' in the
resulting traces.

This breaks the API so this commits also fixes the usage all around
the place.

Comment 9 Thibault Saunier 2016-10-18 10:21:44 UTC

Created attachment 337928 [details] [review]
tracerctfrecord: Flush the streams every 1000 events

This is the usual way to use the babeltrace API, it allows to do
the flushing atomically but avoid using too much memory at runtime
(also CTF events should be *very* lightweight).

Comment 10 Thibault Saunier 2016-10-18 10:21:51 UTC

Created attachment 337929 [details] [review]
tracerrecord: Rationalize environment variables

Use one single environment variable to define the output format:
GST_DEBUG_FORMAT, you can pass extra arguments in the variable,
and for example you will need to define where to output the CTF
logs in the form: `ctf,dir=/dir/where/to/output/ctf/traces/`.

Also added support to set the equivalent of GST_DEBUG_FILE
doing `GST_DEBUG_FORMAT=gst,file=debug.log`

Comment 11 Thibault Saunier 2016-10-28 18:43:42 UTC

My goals of outputting our traces and logs in CTF where (somehow ordered by importance):

  * Output traces with minimal overhead
  * Enable reusing existing tools to mine the traces
  * Have smaller trace files size
  * Allow mining to be fast

I tested and benchmarked 2 implementations using the following tools:

  * babel[0], it was designed mostly to do trace format conversion,
    and thus is not MT safe, meaning I needed a dedicated thread to
    work with it. Its API fits pretty well in our current design.
  * lttng-ust[1] which is more linked to the linux kernel and is not
    very portable at this time (works on mac afaict, no windows support).
    Also the API does not fit well at all with our current design.

The benchmarking consists on outputting the GStreamer debug logs using
those format and compare the performances with our current implementation,
and when no debug is enabled.

An example result for the following pipeline:

The `stats.` tests are running the `stats` tracer and checking the logs for it.

Results with audiotestsrc num-buffers=500000 ! fakesink:

╒═════════════════════════════════════════════════════╤══════════╤════════════╕
│ Test name                                           │ size     │ duration   │
╞═════════════════════════════════════════════════════╪══════════╪════════════╡
│ stats.audiotestsrc.1_runs.500000_buffers.gstdebug   │ 200.0MiB │ 28.335s    │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ stats.audiotestsrc.1_runs.500000_buffers.babeltrace │ 39.9MiB  │ 24.661s    │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ stats.audiotestsrc.1_runs.500000_buffers.no_logs    │ 0.0B     │ 23.396s    │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ audiotestsrc.1_runs.500000_buffers.babeltrace       │ 1.7GiB   │ 156.701s   │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ audiotestsrc.1_runs.500000_buffers.gstdebug         │ 2.1GiB   │ 62.773s    │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ audiotestsrc.1_runs.500000_buffers.no_logs          │ 0.0B     │ 23.183s    │
├─────────────────────────────────────────────────────┼──────────┼────────────┤
│ audiotestsrc.1_runs.500000_buffers.ust              │ 1.7GiB   │ 45.460s    │
╘═════════════════════════════════════════════════════╧══════════╧════════════╛

We can see that for tracer logs (properly formatted) babeltrace is being
more efficient, in term of speed and disk usage (overhead is actually pretty
low).

For debug logs, it is being much slower for still unknown reasons.

I think using babeltrace makes quite some sense for tracer logs, but we will
need to check why it is so inefficient for standard debug logs.

I have done some more benchmarking in other conditions and results look quite coherent
with that small example.


[0] diamon.org/babel/
[1] http://lttng.org/man/3/lttng-ust/v2.7/

Comment 12 Tim-Philipp Müller 2016-10-28 18:53:47 UTC

I would really like to avoid adding this as another core dependency - would it be hard to write code for this ourselves? Is it complicated?

Comment 13 Thibault Saunier 2016-10-28 19:18:48 UTC

Well, it is not *that* complex, but it is an optional dependency and needed only for developers as tracing is really a developer thing so I am not sure it is a problem at all (we talked about it with Sebastien and Edward and they seem to agree with that).

Still, I could implement it myself if people really think it is the right thing to do.

Comment 14 Tim-Philipp Müller 2016-10-28 19:35:07 UTC

It means we can make sure it's both portable and thread-safe at least :)

Comment 15 Thibault Saunier 2016-10-28 20:16:34 UTC

Yeah, that is pretty true, I will see what I do, I think current implementation is not bad either.

I will stop updating the patches and instead link to my git branch:

https://github.com/thiblahute/gstreamer/tree/ctf

Comment 16 Thibault Saunier 2016-11-02 16:24:07 UTC

(In reply to Tim-Philipp Müller from comment #14)
> It means we can make sure it's both portable and thread-safe at least :)

Well, thinking about it, the way I would make it thread safe in our own writer implementation is pretty similar to what I am doing using babeltrace, ie. using a lock-less queue and a dedicated thread to dequeue and write the logs, which means we are not needing a lock for each write, but only while flushing. I am not convinced it is worth re-implementing the writer tbh.

Comment 17 Thibault Saunier 2016-11-08 13:01:55 UTC

What do you think we should do with that?

I was thinking that if you really think we should implement CTF in GStreamer it would make sense to do so in rust, it is probably a good case for starting pushing rust in our code base and this is all for developers only and optional.

Comment 18 Tim-Philipp Müller 2016-11-08 13:18:37 UTC

I don't know really.

I very strongly dislike adding this as a new dependency to libgstreamer-1.0.so.

Perhaps we can make the output/writer part pluggable as well and then it will be a trace plugin that has this dependency?

I think the Rust angle is a bit premature and I don't know if it's technically really warranted here, I think the case is stronger for processing untrusted input than for outputting trusted data :) Either way am not convinced GStreamer core is where we should start introducing Rust.

Comment 19 Thibault Saunier 2016-11-08 13:23:49 UTC

> I very strongly dislike adding this as a new dependency to libgstreamer-1.0.so.

My main question is why? I do not really understand what is the problem of having it as an optional dependency? This is really for developers and it should never be set as a dep for packages, also I started writing a meson wrap for babeltrace so that we could have it built as part of gst-build.

I will think about how we could have it as a plugin.

Comment 20 Nicolas Dufresne (ndufresne) 2018-07-07 16:12:28 UTC

Has no one have mention it, GstShark made a minimal CTF writer, it's not great, but I believe did the job for them, avoiding external deps.

https://github.com/RidgeRun/gst-shark/blob/master/plugins/tracers/gstctf.c

My overall interest in CTF is to have a dense format for streaming tracers over the network or to another process. Generally speaking, not much possibilities is being offered here since the CTF code writes to back log buffer which get flushed to disk. I need something different really, something more flexible.

Comment 21 Nicolas Dufresne (ndufresne) 2018-07-10 01:34:19 UTC

Review of attachment 337925 [details] [review]:

::: gst/gsttracerrecord.c
@@ +216,2 @@
   self = g_object_newv (GST_TYPE_TRACER_RECORD, 0, NULL);
+  self->priv->spec = structure;

The spec is just never used outside of build-format. I would make a pre-patch that drops it's storage, or make use/expose it.

Comment 22 GStreamer system administrator 2018-11-03 12:37:03 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/194.