After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 710320 - g-ir-scanner: too much time is spent comparing path names
g-ir-scanner: too much time is spent comparing path names
Status: RESOLVED FIXED
Product: gobject-introspection
Classification: Platform
Component: general
unspecified
Other All
: Normal normal
: ---
Assigned To: gobject-introspection Maintainer(s)
gobject-introspection Maintainer(s)
Depends on:
Blocks:
 
 
Reported: 2013-10-16 19:33 UTC by Dieter Verfaillie
Modified: 2015-02-07 16:53 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
giscanner: speed up SourceScanner().parse_files() (26.94 KB, patch)
2013-10-16 19:38 UTC, Dieter Verfaillie
reviewed Details | Review
Revert "sourcescanner: Avoid scanning files when doing a macro scan" (1.86 KB, patch)
2013-10-17 11:20 UTC, Dieter Verfaillie
none Details | Review
Revert "sourcescanner: Do some fast path checks on the filename" (968 bytes, patch)
2013-10-17 11:23 UTC, Dieter Verfaillie
none Details | Review
giscanner: restore linemark filename unescaping (1.54 KB, patch)
2013-10-17 11:25 UTC, Dieter Verfaillie
none Details | Review
giscanner: speed up SourceScanner().parse_files() (27.01 KB, patch)
2013-10-17 11:26 UTC, Dieter Verfaillie
none Details | Review
giscanner: remove g_realpath (4.59 KB, patch)
2013-10-17 11:32 UTC, Dieter Verfaillie
none Details | Review
Revert "sourcescanner: Avoid scanning files when doing a macro scan" (1.91 KB, patch)
2013-10-17 11:44 UTC, Dieter Verfaillie
accepted-commit_now Details | Review
Revert "sourcescanner: Do some fast path checks on the filename" (1019 bytes, patch)
2013-10-17 11:44 UTC, Dieter Verfaillie
accepted-commit_now Details | Review
giscanner: restore linemark filename unescaping (1.59 KB, patch)
2013-10-17 11:44 UTC, Dieter Verfaillie
accepted-commit_now Details | Review
giscanner: speed up SourceScanner().parse_files() (27.01 KB, patch)
2013-10-17 11:44 UTC, Dieter Verfaillie
accepted-commit_now Details | Review
giscanner: remove g_realpath (4.64 KB, patch)
2013-10-17 11:45 UTC, Dieter Verfaillie
reviewed Details | Review
Revert "sourcescanner: Do some fast path checks on the filename" (1.12 KB, patch)
2013-10-17 18:25 UTC, Dieter Verfaillie
none Details | Review
Revert "sourcescanner: Do some fast path checks on the filename" (1.13 KB, patch)
2013-10-17 18:29 UTC, Dieter Verfaillie
committed Details | Review
Revert "sourcescanner: Avoid scanning files when doing a macro scan" (1.95 KB, patch)
2013-10-17 18:30 UTC, Dieter Verfaillie
committed Details | Review
giscanner: restore linemark filename unescaping (1.59 KB, patch)
2013-10-17 18:32 UTC, Dieter Verfaillie
committed Details | Review
giscanner: speed up SourceScanner().parse_files() (27.09 KB, patch)
2013-10-17 18:33 UTC, Dieter Verfaillie
committed Details | Review
giscanner: remove g_realpath (7.32 KB, patch)
2013-10-17 18:38 UTC, Dieter Verfaillie
committed Details | Review
giscanner: Make sure we use real paths in more places (3.54 KB, patch)
2013-11-13 11:34 UTC, Carlos Garcia Campos
none Details | Review

Description Dieter Verfaillie 2013-10-16 19:33:39 UTC
first stab as speeding g-ir-scanner up following shortly...
Comment 1 Dieter Verfaillie 2013-10-16 19:38:20 UTC
Created attachment 257445 [details] [review]
giscanner: speed up SourceScanner().parse_files()

Was looking around a bit and noticed about 2/3 of
g-ir-scanner time is spend in SourceScanner().parse_files().

Some profiling quickly most of the time spent in
SourceScanner().parse_files() is used by comparing
paths in gi_source_scanner_add_symbol(). Fix this
by maintaining a hash table of GFile's instead of a
list of file names.

This makes "g-ir-scanner <whole_bunch_of_options> --output Gtk-3.0.gir"
complete in about 10 seconds on my box instead of about
30 seconds (both best of 3 runs).
Comment 2 Colin Walters 2013-10-16 19:45:01 UTC
Review of attachment 257445 [details] [review]:

::: giscanner/giscannermodule.c
@@ +363,3 @@
 
+  file = g_file_new_for_path (filename);
+  g_hash_table_replace (self->scanner->files, file, file);

g_hash_table_add() for sets.

@@ -516,3 @@
     return NULL;
 
-  self->scanner->current_filename = g_realpath (filename);

We're losing realpath...we need to consider whether or not it's actually necessary.  If you don't believe it is, say why in the commit message.

See https://bugzilla.gnome.org/show_bug.cgi?id=704864

My inclination is to keep it out of perhaps an excess of conservativsm, but if you want to delete it, it needs a rationale.

@@ +525,3 @@
     }
+  file = g_file_new_for_path (filename);
+  g_hash_table_replace (self->scanner->files, file, file);

g_hash_table_add()

::: giscanner/scannerlexer.l
@@ +212,3 @@
 "L\""{stringtext}*"\""			{ return STRING; }
 
+.					{ if (yytext[0]) fprintf(stderr, "%s:%d: unexpected character `%c'\n", g_file_get_parse_name (scanner->current_file), lineno, yytext[0]); }

This leaks the path.

@@ +239,3 @@
      * starts with one '/' followed by exactly two '*' and not followed by a '/'
      */
+    if (!g_hash_table_lookup (scanner->files, scanner->current_file)) {

Prefer g_hash_table_contains() for sets.

@@ -324,3 @@
-
-	compress = g_strcompress (filename);
-        real = g_realpath (filename);

Losing realpath here.

::: giscanner/sourcescanner.c
@@ +246,3 @@
   g_slist_free (scanner->symbols);
 
+  g_hash_table_destroy (scanner->files);

I tend to use _unref() in new code.  While in practice it's the same here, sometimes it makes sense to hand out references to containers, and _unref supports that.

@@ +277,3 @@
+    if (g_hash_table_lookup (scanner->files, scanner->current_file)) {
+      scanner->symbols = g_slist_prepend (scanner->symbols,
+                                          gi_source_symbol_ref (symbol));

Er, why not just:

if (scanner->macro_scan || g_hash_table_contains (scanner->files, scanner->current_file))

?
Comment 3 Dieter Verfaillie 2013-10-17 11:20:55 UTC
Created attachment 257491 [details] [review]
Revert "sourcescanner: Avoid scanning files when doing a macro scan"

Will be not needed in about 3 patches...
Comment 4 Dieter Verfaillie 2013-10-17 11:23:17 UTC
Created attachment 257492 [details] [review]
Revert "sourcescanner: Do some fast path checks on the filename"

The "fast" path doesn't seem to work at all. Testing shows creating
Gtk-3.0.gir still takes about 30 seconds with this and even on
a Linux box...
Comment 5 Dieter Verfaillie 2013-10-17 11:25:28 UTC
Created attachment 257494 [details] [review]
giscanner: restore linemark filename unescaping

Not really relevant to speeding things up, just a small thing I noticed
while working on process_linemarks() for the patch that follows...
Comment 6 Dieter Verfaillie 2013-10-17 11:26:58 UTC
Created attachment 257495 [details] [review]
giscanner: speed up SourceScanner().parse_files()

Addresses everything from comment #2 except this. Thought appreciated :)

> ::: giscanner/sourcescanner.c
> @@ +246,3 @@
>    g_slist_free (scanner->symbols);
> 
> +  g_hash_table_destroy (scanner->files);
> 
> I tend to use _unref() in new code.  While in practice it's the same here,
> sometimes it makes sense to hand out references to containers, and _unref
> supports that.

Hmm, scanner->typedef_table and scanner->struct_or_union_or_enum_table
are g_hash_table_destroy()-ed only a couple of lines above, so I thought
it would be more consistent to also _destroy() ?
Comment 7 Dieter Verfaillie 2013-10-17 11:32:07 UTC
Created attachment 257496 [details] [review]
giscanner: remove g_realpath

As said in the patch description: needs testing to see
if GetFullPathName is really needed on win32, especially
in MinGW/MSYS situations. Don't know off-hand if MSYS
conversions between forward vs backward slash path separators,
/c/ vs c:\ drives, etc are going to break things or not...

For all I know GFile simply does the right thing and I'm
worrying about nothing here :)
Comment 8 Dieter Verfaillie 2013-10-17 11:44:02 UTC
Created attachment 257501 [details] [review]
Revert "sourcescanner: Avoid scanning files when doing a macro scan"
Comment 9 Dieter Verfaillie 2013-10-17 11:44:18 UTC
Created attachment 257504 [details] [review]
Revert "sourcescanner: Do some fast path checks on the filename"
Comment 10 Dieter Verfaillie 2013-10-17 11:44:34 UTC
Created attachment 257505 [details] [review]
giscanner: restore linemark filename unescaping
Comment 11 Dieter Verfaillie 2013-10-17 11:44:51 UTC
Created attachment 257506 [details] [review]
giscanner: speed up SourceScanner().parse_files()
Comment 12 Dieter Verfaillie 2013-10-17 11:45:07 UTC
Created attachment 257507 [details] [review]
giscanner: remove g_realpath
Comment 13 Colin Walters 2013-10-17 12:23:46 UTC
(In reply to comment #6)

> Hmm, scanner->typedef_table and scanner->struct_or_union_or_enum_table
> are g_hash_table_destroy()-ed only a couple of lines above, so I thought
> it would be more consistent to also _destroy() ?

g-i is pretty old code; I couldn't see that there was another call to _destroy() from the patch context.  I've run into this before with e.g. g_clear_object() too.  It's up to you then; I would lean a bit towards using _unref anyways personally.
Comment 14 Colin Walters 2013-10-17 12:24:26 UTC
Review of attachment 257501 [details] [review]:

Ok, but it'd be nice if the commit message had the comment you added.
Comment 15 Colin Walters 2013-10-17 12:24:59 UTC
Review of attachment 257501 [details] [review]:

Ok, but it'd be nice if the commit message had the comment you added.
Comment 16 Colin Walters 2013-10-17 12:26:22 UTC
Review of attachment 257504 [details] [review]:

Same here, would be nice if the rationale was in the git commit as well.
Comment 17 Colin Walters 2013-10-17 12:47:36 UTC
Review of attachment 257505 [details] [review]:

Looks good.
Comment 18 Colin Walters 2013-10-17 13:16:44 UTC
Review of attachment 257506 [details] [review]:

Just comment.

::: giscanner/giscannermodule.c
@@ +362,3 @@
     return NULL;
 
+  file = g_file_new_for_path ( g_realpath (filename));

The extra space here is not GNU style (which g-i uses).  Though now that I look, this code is fairly inconsistent =(
Comment 19 Colin Walters 2013-10-17 13:16:55 UTC
Review of attachment 257506 [details] [review]:

(Just one comment)
Comment 20 Colin Walters 2013-10-17 13:21:01 UTC
Review of attachment 257507 [details] [review]:

Mmm...realpath is different from the "canonicalization" that GFile does.  realpath(), in particular, resolves symbolic links.  GFile does not.

The original commit message is:

+2008-02-18  Rob Taylor  <rob.taylor@codethink.co.uk>
+
+       * tools/grealpath.h: Added:
+       * tools/scanner.c: (main):
+       * tools/scannerlexer.l:
+       * tools/Makefile.am:
+       Always use absolute paths with symbolic links resolved when
+       comparing filenames.

Sadly though, it doesn't say *why*.  This kind of thing is why modern git commit message usage is so much better than the old GNU changelog approach.

Let me see if I can ask Rob if he remembers why this was added.
Comment 21 Colin Walters 2013-10-17 14:28:09 UTC
<robtaylor> walters: I think i probably added that for users of symlinked /usr/local
<robtaylor> (i used to use stow when developing )
<walters> robtaylor: ok hm, but...we aren't actually scanning files in /usr/local/stow/libfoo-1.0/libfoo.h right, just the files in srcdir
<robtaylor> walters: ah, hmm, i guess you could be messed up if your source dir is in a symlinked directory
<robtaylor> walters: pass - there's really been too much churn from when i wrote it to see if it's still relevent to the way it's used now =)
<walters> robtaylor: yeah understood =) i'll just pass this info on to dieterv and let him decide what to do; if it breaks we can always readd it

So...up to you =)
Comment 22 Dieter Verfaillie 2013-10-17 18:25:31 UTC
Created attachment 257575 [details] [review]
Revert "sourcescanner: Do some fast path checks on the filename"

This reverts commit 8c0ca4717d834a6c578579656683c55ea22a06f4.
The "fast" path doesn't seem to work at all. Testing shows creating
Gtk-3.0.gir still takes about 30 seconds with this (even on
a Linux box)...
Comment 23 Dieter Verfaillie 2013-10-17 18:29:34 UTC
Created attachment 257576 [details] [review]
Revert "sourcescanner: Do some fast path checks on the filename"

Please ignore comment #22, git bz stopped doing whatever
it should have done again, so here goes with manual patch
attaching...

Add comment as mentioned in comment #15 and comment #16
Comment 24 Dieter Verfaillie 2013-10-17 18:30:41 UTC
Created attachment 257577 [details] [review]
Revert "sourcescanner: Avoid scanning files when doing a macro scan"

Add comment to commit message as mentioned in comment #14
Comment 25 Dieter Verfaillie 2013-10-17 18:32:14 UTC
Created attachment 257578 [details] [review]
giscanner: restore linemark filename unescaping
Comment 26 Dieter Verfaillie 2013-10-17 18:33:48 UTC
Created attachment 257579 [details] [review]
giscanner: speed up SourceScanner().parse_files()

(In reply to comment #18)
> Review of attachment 257506 [details] [review]:
> 
> Just comment.
> 
> ::: giscanner/giscannermodule.c
> @@ +362,3 @@
>      return NULL;
> 
> +  file = g_file_new_for_path ( g_realpath (filename));
> 
> The extra space here is not GNU style (which g-i uses).  Though now that I
> look, this code is fairly inconsistent =(

That space in there is a typo, now fixed, thanks for noticing :)
Comment 27 Dieter Verfaillie 2013-10-17 18:38:50 UTC
Created attachment 257582 [details] [review]
giscanner: remove g_realpath

(In reply to comment #21)
> <robtaylor> walters: I think i probably added that for users of symlinked
> /usr/local
> <robtaylor> (i used to use stow when developing )
> <walters> robtaylor: ok hm, but...we aren't actually scanning files in
> /usr/local/stow/libfoo-1.0/libfoo.h right, just the files in srcdir
> <robtaylor> walters: ah, hmm, i guess you could be messed up if your source dir
> is in a symlinked directory
> <robtaylor> walters: pass - there's really been too much churn from when i
> wrote it to see if it's still relevent to the way it's used now =)
> <walters> robtaylor: yeah understood =) i'll just pass this info on to dieterv
> and let him decide what to do; if it breaks we can always readd it
> 
> So...up to you =)

How about this: calculate real paths only where they enter the system
in SourceScanner().parse_files() and SourceScanner().parse_macros()
using os.path.realpath() and remove the absolute and real path
calculations elsewhere ?
Comment 28 Colin Walters 2013-10-17 19:04:34 UTC
Review of attachment 257582 [details] [review]:

Ah, so we just use the Python version on the boundary?  Makes sense.
Comment 29 Colin Walters 2013-10-17 19:13:44 UTC
Attachment 257576 [details] pushed as 298e34c - Revert "sourcescanner: Do some fast path checks on the filename"
Attachment 257577 [details] pushed as ce1e2a2 - Revert "sourcescanner: Avoid scanning files when doing a macro scan"
Attachment 257578 [details] pushed as 69d2fe3 - giscanner: restore linemark filename unescaping
Attachment 257579 [details] pushed as 65a0fa4 - giscanner: speed up SourceScanner().parse_files()
Attachment 257582 [details] pushed as ebb8050 - giscanner: remove g_realpath
Comment 30 Carlos Garcia Campos 2013-11-13 10:39:02 UTC
This broke gobject-instrospection when using symlinks. In my case my jhbuild prefix is /home/carlos/gnome which is a symlink to /home/carlos/gnome-<version>. Now my gir files for glib, gio and gobject contain only constants (because iin that case macro_scan is TRUE in gi_source_scanner_add_symbol()). One of the problems is that in SourceScanner.parse_files we are passing real paths to the scanner.append_filename but not to scanner.lex_filename. But that's not enough to fix it, because we are still getting non real paths from the preprocessor, and process_linemarks is setting the non real path as the current file of the scanner.
Comment 31 Carlos Garcia Campos 2013-11-13 11:34:37 UTC
Created attachment 259726 [details] [review]
giscanner: Make sure we use real paths in more places 

This patch seems to fix the problem, I hope this covers all places were we need to convert paths.
Comment 32 Carlos Garcia Campos 2013-11-13 11:35:19 UTC
Forgot to reopen it.
Comment 33 Carlos Garcia Campos 2013-11-13 11:36:43 UTC
hmm, I've just noticed this bug was about reducing the time comparing filenames. I'll use a new bug instead, sorry for the noise.
Comment 34 André Klapper 2015-02-07 16:53:16 UTC
[Mass-moving gobject-introspection tickets to its own Bugzilla product - see bug 708029. Mass-filter your bugmail for this message: introspection20150207 ]