Bug 107138 – g_ascii_strcasecmp performance

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 107138 - g_ascii_strcasecmp performance


Summary:	g_ascii_strcasecmp performance


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2003-02-26 17:34 UTC by Morten Welinder
Modified:	2011-02-18 15:57 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Also do case conversion when needed (917 bytes, patch) 2003-02-26 17:35 UTC, Morten Welinder	none	Details \| Review
Benchmrk (7.73 KB, patch) 2003-05-25 17:41 UTC, Owen Taylor	none	Details \| Review

Description Morten Welinder 2003-02-26 17:34:39 UTC

g_ascii_strcasecmp calls g_ascii_tolower entirely too often.  (To the tune
of 1% of gnumeric's running time in my tree.)  Also, one might consider
inlining g_ascii_tolower.

Patch coming up.

Comment 1 Morten Welinder 2003-02-26 17:35:18 UTC

Created attachment 14639 [details] [review]
Also do case conversion when needed

Comment 2 Owen Taylor 2003-02-26 17:46:14 UTC

I suspect that inlining g_ascii_tolower() would be the
necessary first step _before_ considering this.

Comment 3 Morten Welinder 2003-02-26 18:08:47 UTC

I considered that, but on closer inspection that idea is not right.

With my patch, we spend the time for one comparison of two characters.

That should be extremely cheap given that the memory needs to be
referenced anyway.  When it fails, that is extra work; when it succeeds,
it is much less work than two g_ascii_tolower calls (expanded or not).

When used in a hash equal function (common, I would expect), there are
barely more than one failure per call unless your hash function sucks.

Comment 4 Owen Taylor 2003-05-25 17:41:09 UTC

Created attachment 16831 [details] [review]
Benchmrk

Comment 5 Owen Taylor 2003-05-25 18:20:47 UTC

Attached a benchmark.

Corpus: words in /usr/dict/words, munged to be 
1/3 lowercase, 1/3 UPPERCASE, 1/3 Mixedcase

Test1: Insert all the words in a hash table, then look them
       all up.
Test2: Sort the words with qsort.

Results:

$ rm -f strcmp-bench && make OPTIMIZE="-O2" strcmp-bench && ./strcmp-bench
cc `pkg-config --cflags gtk+-2.0` -g -Wall -O2  `pkg-config --libs
gtk+-2.0`  strcmp-bench.c   -o strcmp-bench
45427 words, 10 iterations
Original: Hash, wall: 1.09s user: 1.10s
Original: Sort, wall: 1.78s user: 1.78s
MortenW : Hash, wall: 0.87s user: 0.90s
MortenW : Sort, wall: 1.41s user: 1.40s
Inlined : Hash, wall: 0.84s user: 0.83s
Inlined : Sort, wall: 0.78s user: 0.77s
Both    : Hash, wall: 0.81s user: 0.80s
Both    : Sort, wall: 0.74s user: 0.73s

Conclusions:

 - Both inlining and your optimization are significantly
   faster than the current code.
 - As you suggested, your optimization is a bigger win for 
   hash table usage than sorting.
 - Inlining is a bigger win than your optimization for
   both tests
 - Even with inlining, your optimization seems to help
   a little bit, but the effect is much smaller.

What if we decrease the branch penalty by enabling CMOV?

$ rm -f strcmp-bench && make OPTIMIZE="-O2 -march=pentiumpro
-mcpu=pentiumpro" strcmp-bench && ./strcmp-bench
cc `pkg-config --cflags gtk+-2.0` -g -Wall -O2 -march=pentiumpro
-mcpu=pentiumpro  `pkg-config --libs gtk+-2.0`  strcmp-bench.c   -o
strcmp-bench
45427 words, 10 iterations
Original: Hash, wall: 1.04s user: 1.02s
Original: Sort, wall: 1.70s user: 1.69s
MortenW : Hash, wall: 0.83s user: 0.82s
MortenW : Sort, wall: 1.38s user: 1.37s
Inlined : Hash, wall: 0.78s user: 0.76s
Inlined : Sort, wall: 0.69s user: 0.69s
Both    : Hash, wall: 0.76s user: 0.73s
Both    : Sort, wall: 0.71s user: 0.71s

With that, your optimization actually slows down the sort 
test a bit (the numbers above are repeatable, it's not
just noise, but the difference is certainly small.)

More tests:

On a PentiumIV, OPTIMIZE="-O2 -march=pentiumpro -mpcu=pentium4"

$ ./strcmp-bench
45427 words, 10 iterations
Original: Hash, wall: 0.97s user: 0.96s
Original: Sort, wall: 0.87s user: 0.88s
MortenW : Hash, wall: 0.91s user: 0.91s
MortenW : Sort, wall: 0.75s user: 0.73s
Inlined : Hash, wall: 0.87s user: 0.90s
Inlined : Sort, wall: 0.54s user: 0.52s
Both    : Hash, wall: 0.86s user: 0.85s
Both    : Sort, wall: 0.50s user: 0.50s

x86_64, OPTIMIZE="-O2" (disclaimer: this is in no
way a benchmark of the processor, the numbers below
should be compared only with themselves.)

$ ./strcmp-bench
45427 words, 10 iterations
Original: Hash, wall: 0.60s user: 0.60s
Original: Sort, wall: 0.64s user: 0.63s
MortenW : Hash, wall: 0.56s user: 0.56s
MortenW : Sort, wall: 0.57s user: 0.56s
Inlined : Hash, wall: 0.57s user: 0.60s
Inlined : Sort, wall: 0.54s user: 0.54s
Both    : Hash, wall: 0.53s user: 0.51s
Both    : Sort, wall: 0.45s user: 0.46s

PPC64, OPTIMIZE="-O2"

./strcmp-bench
45427 words, 10 iterations
Original: Hash, wall: 0.77s user: 0.76s
Original: Sort, wall: 1.18s user: 1.20s
MortenW : Hash, wall: 0.68s user: 0.69s
MortenW : Sort, wall: 1.04s user: 1.02s
Inlined : Hash, wall: 0.65s user: 0.65s
Inlined : Sort, wall: 0.79s user: 0.78s
Both    : Hash, wall: 0.65s user: 0.65s
Both    : Sort, wall: 0.79s user: 0.79s

OK, I'm getting bored now :-) My conclusion here is
that we might as well put in the "both" version ...
its usually at least as fast as inlined-only version,
and occasionally a fair bit faster.

Of course, all the above numbers are with gcc-3.2.x -
a different compiler might give considerably different
results.

Comment 6 Morten Welinder 2003-05-27 00:40:08 UTC

I bet I could come up with a benchmark to show whatever I wanted...
However, it looks like the trivial inline will catch most benefits
that there are, so why not just do that as a smallest-change
approach?

Comment 7 Owen Taylor 2003-05-27 03:05:18 UTC

The benchmarks here are meant to be representative of typical
uses of g_ascii_strcmp() ... clearly they don't cover all
possibilities, but I would content that they are *vastly*
better than random guessing.

(See your comment that inlining should be irrelevant compared
to checking for character equality first, for example...)

Comment 8 Owen Taylor 2003-05-30 23:31:08 UTC

Went with inlining only; in absence of clear benchmarks,
might as well keep things simple.

Fri May 30 19:23:47 2003  Owen Taylor  <otaylor@redhat.com>
 
        * glib/gstrfuncs.c (g_ascii_strncasecmp)
        * glib/gstrfuncs.c (g_ascii_strcasecmp): Use TOLOWER()
        macro instead of g_ascii_tolower() (#107138)