After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 782407 - missing stack traces when recording
missing stack traces when recording
Status: RESOLVED OBSOLETE
Product: sysprof
Classification: Other
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Sysprof maintainer(s)
Sysprof maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2017-05-09 20:12 UTC by Christian Hergert
Modified: 2021-07-01 18:12 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
xzipped perf data (1.61 MB, application/octet-stream)
2017-05-11 10:15 UTC, Jean-Marc Lasgouttes
Details

Description Christian Hergert 2017-05-09 20:12:47 UTC
In some situations, we are vastly missing symbols in the callgraph.

I think that is limited to "single program recording", but we need to verify this.

Here are some examples:

 http://www-rocq.inria.fr/~lasgoutt/sysprof-lyx.png
 http://www-rocq.inria.fr/~lasgoutt/perf-record.png

What we need to figure out next is:

 - Were the sysprof and perf recordings both recording just the target
   program and not the whole system. It *appears* to me that I often get
   reduced information when recording just the single program as opposed to
   the whole system (and I'm not sure why).

 - Did the capture contain all the symbols, but we just failed to present
   them properly (this helps us nail down whether the bug is in capture or
   is it in compute/display).
Comment 1 Jean-Marc Lasgouttes 2017-05-11 10:15:18 UTC
Created attachment 351613 [details]
xzipped perf data

This perf data corresponds to the screen shot referred to in the bug description.

LyX was compiled using options
  -g -O2 -std=c++14  -fno-omit-frame-pointer

Then it was lauched and "perf record" was attached to its PID using
  sudo perf record -g -p PID
Comment 2 Christian Hergert 2017-05-18 02:35:10 UTC
When recording with sysprof, did you use sudo by chance? For example:

 sudo sysprof-cli -p PID

I ask because when not using sudo, we have to ask a system service to elevate our privileges (and pass the perf FD back). As you can imagine that expands the surface area for me to explore.
Comment 3 Jean-Marc Lasgouttes 2017-05-18 08:16:12 UTC
I used the GUI for capturing the samples. I did have to provide my password at some point.

I can retry with sudo sysprof-cli, but this will be next week.
Comment 4 Christian Hergert 2017-05-18 08:54:14 UTC
Sure

That means we are asking sysprofd to do our __NR_perf_event_open syscall, and handing us back an open fd.

That should probably be enough info for me to replicate as soon as I manage to find some free time to hack on Sysprof.
Comment 5 Jean-Marc Lasgouttes 2017-06-16 16:21:26 UTC
Is there some additional testing I could attempt to help your diagnosis?
Comment 6 Jean-Marc Lasgouttes 2017-06-16 16:41:39 UTC
FWIW, I tried again with sysprof-cli -p (which asked me for a password), and the capture.syscap file that was produced lead to the same issue when loading it in sysprof.
Comment 7 Christian Hergert 2017-06-16 20:40:42 UTC
I think the main thing preventing me from digging in right now is just lack of time.

I am curious to know if you get different information simply by doing whole-system recording (omit -p). You'll of course get other application information, but I want to know if the amount of information you get on your process also improves.
Comment 8 Jean-Marc Lasgouttes 2017-06-21 15:59:13 UTC
I just tried it, and the result is the same. Only fjes_hw_epbuf_tx_pkt_send appears under src/lyx.
Comment 9 Jean-Marc Lasgouttes 2017-06-21 16:13:38 UTC
The funny thing (or a sad, I do not know anymore) is that _all_ the processes are missing a proper stack trace. This is not related at all to the program that I am trying to profile.

Could it be a bad (or unexpected) setting in ubuntu?
Comment 10 Christian Hergert 2017-06-21 21:20:47 UTC
Most distributions out-of-the box have misguided compilation settings (in my opinion). They generally compile with frame pointers disabled because on 32-bit x86 everything needed to be stashed on the stack. So it actually was a non-trivial performance improvement. But on x86_64, it's just not the case. You maybe, if you're lucky, snag a .5% speed-up. But alas, they tend to do it anyway, making it difficult to get fast, reliable stack traces.

What I use Sysprof for mostly, is profiling the GNOME stack while we are developing it. Which means I've built most of the core components in JHBuild (just some fancy python build wrangler scripts) and change all those settings to something more reasonable. (I generally set -fno-omit-frame-pointers and -O0, but the later is not as important).

But what I found interesting is that you had different data when profiling with perf. That means we have an issue in one of a couple of areas.

 - We are calling the perf_event_open syscall differently than the perf
   command line too. (This is likely to some degree, but how much?)

 - We are falling behind the mmap()'d ring buffer we communicate with perf
   over and therefore are missing samples. (Seems unlikely to me).

 - We are failing to resolve the instruction-pointers when generating the
   callgraph and therefore they get lost or combined into "In file *".

   This one has more moving parts, because there is a bunch of trickery going
   on to locate the proper ELF. We have some symbol directories to look through
   as well as cracking open the ELF and finding the build-id field w/ some CRC
   checks.

   Recently (in 3.24.0 I believe) I added support to locate symbols from inside
   of containers (when we see /newroot/ in the path) and this was rather
   complicated set of heuristics. It's possible there is a regression there.

   If you're running on something older than 3.24 we can rule the /newroot/
   stuff out.
Comment 11 André Klapper 2021-07-01 18:12:45 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/sysprof/-/issues/

Thank you for your understanding and your help.