Bug 619437 – New inline functions for iteration over UTF-8

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 619437 - New inline functions for iteration over UTF-8


Summary:	New inline functions for iteration over UTF-8


Status:	RESOLVED OBSOLETE

Product:	glib
Classification:	Platform
Component:	i18n
Version:	2.25.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:	614856

Reported:	2010-05-23 13:18 UTC by Mikhail Zabaluev
Modified:	2018-05-24 12:19 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Added g_utf8_iterate() (2.55 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Added a functional test for g_utf8_iterate() (1.40 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Added performance test for g_utf8_iterate() (1.27 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Added g_utf8_iterate_back() (1.98 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Added a functional test for g_utf8_iterate_back() (1.09 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Added a performance test for g_utf8_iterate_back() (1.32 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Documented g_utf8_iterate() and g_utf8_iterate_back() (2.38 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Don't let g_utf8_iterate go past the end of the string in tests (1.63 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Make g_utf8_iterate() and g_utf8_iterate_back() inline functions (5.69 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
Make the UTF-8 decoding mask explicitly 32 bits wide (1.29 KB, patch) 2010-05-23 13:24 UTC, Mikhail Zabaluev	none	Details \| Review
All-in-one patch (7.81 KB, patch) 2010-06-06 14:01 UTC, Mikhail Zabaluev	none	Details \| Review

Description Mikhail Zabaluev 2010-05-23 13:18:49 UTC

Moving out from bug #614856, patches adding two inline functions for faster iteration over UTF-8 characters, g_utf8_iterate() and g_utf8_iterate_back().

Comment 1 Mikhail Zabaluev 2010-05-23 13:24:05 UTC

Created attachment 161789 [details] [review]
Added g_utf8_iterate()

Comment 2 Mikhail Zabaluev 2010-05-23 13:24:09 UTC

Created attachment 161790 [details] [review]
Added a functional test for g_utf8_iterate()

Comment 3 Mikhail Zabaluev 2010-05-23 13:24:13 UTC

Created attachment 161791 [details] [review]
Added performance test for g_utf8_iterate()

Comment 4 Mikhail Zabaluev 2010-05-23 13:24:16 UTC

Created attachment 161792 [details] [review]
Added g_utf8_iterate_back()

Comment 5 Mikhail Zabaluev 2010-05-23 13:24:20 UTC

Created attachment 161793 [details] [review]
Added a functional test for g_utf8_iterate_back()

Comment 6 Mikhail Zabaluev 2010-05-23 13:24:23 UTC

Created attachment 161794 [details] [review]
Added a performance test for g_utf8_iterate_back()

Comment 7 Mikhail Zabaluev 2010-05-23 13:24:27 UTC

Created attachment 161795 [details] [review]
Documented g_utf8_iterate() and g_utf8_iterate_back()

Comment 8 Mikhail Zabaluev 2010-05-23 13:24:31 UTC

Created attachment 161796 [details] [review]
Don't let g_utf8_iterate go past the end of the string in tests

This should make tests friendlier to memory checking tools.
I'm reasonably confident that the function returns 0 on a null byte;
a test could be added specifically for that.

Comment 9 Mikhail Zabaluev 2010-05-23 13:24:35 UTC

Created attachment 161797 [details] [review]
Make g_utf8_iterate() and g_utf8_iterate_back() inline functions

Comment 10 Mikhail Zabaluev 2010-05-23 13:24:39 UTC

Created attachment 161798 [details] [review]
Make the UTF-8 decoding mask explicitly 32 bits wide

Rather than relying on gunichar to be defined as gint32, the algorithm
should now always work properly on 64-bit processors.

Comment 11 Behdad Esfahbod 2010-05-27 17:40:12 UTC

Attach one patch please!

Comment 12 Behdad Esfahbod 2010-05-27 17:53:59 UTC

I don't like the inline functions.   These are nontrivial functions better left as function calls...

Comment 13 Mikhail Zabaluev 2010-05-28 13:56:31 UTC

There are already non-inline functions, and they are about two times slower or worse.
Also, these new functions were not created inline in the branch, there is a separate patch that does specifically that. I could remove it easily.
But the point of these inlines is, they are good to get optimized away in loops. If inlining is disabled, you might as well go with plodding g_utf8_get_char().

Comment 14 Behdad Esfahbod 2010-05-28 18:50:53 UTC

(In reply to comment #13)

> But the point of these inlines is, they are good to get optimized away in
> loops.

What do you mean by "optimized away"?!

And have you done any realworld measurements showing that UTF-8 decoding loops are taking any measurable time?

Comment 15 Mikhail Zabaluev 2010-05-31 13:30:22 UTC

(In reply to comment #14)
> What do you mean by "optimized away"?!

This means they can be inlined into code using local variables on registers, rather than emitted as plain function calls. This speeds up code in loops, where UTF-8 iteration is normally used.

> And have you done any realworld measurements showing that UTF-8 decoding loops
> are taking any measurable time?

No. But I remember Tracker people were interested to optimize their UTF-8? I'll need to look around...

Comment 16 Mikhail Zabaluev 2010-06-06 14:01:39 UTC

Created attachment 162868 [details] [review]
All-in-one patch

Comment 17 Alexander V. Butenko 2010-06-30 12:50:02 UTC

ping?

Comment 18 Behdad Esfahbod 2012-08-21 17:11:34 UTC

I still don't think this is justified.  Leaving up to Matthias to decide.

Comment 19 Colin Walters 2012-08-28 01:54:35 UTC

If claiming performance improvements, describe your application and methodology.  

Ideally there's some independent method to reproduce, but if it's a proprietary application or something, at least describe the high level issues?

Comment 20 GNOME Infrastructure Team 2018-05-24 12:19:49 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/302.