Bug 547200 – g_utf8_find_next_char() issues

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 547200 - g_utf8_find_next_char() issues


Summary:	g_utf8_find_next_char() issues


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	i18n
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal minor
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-08-10 20:30 UTC by Behdad Esfahbod
Modified:	2016-07-17 22:29 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
gutf8.c Docs patch (2.05 KB, patch) 2008-08-11 13:44 UTC, acfryx	none	Details \| Review
gutf8.c Docs patch #1 (2.05 KB, patch) 2008-08-11 17:33 UTC, acfryx	committed	Details \| Review
Fix a corner-case in g_utf8_find_next_char (2.00 KB, patch) 2015-10-04 19:29 UTC, Matthias Clasen	committed	Details \| Review

Description Behdad Esfahbod 2008-08-10 20:30:54 UTC

The docs have a broken sentence:

/**
 * g_utf8_find_next_char:
 * @p: a pointer to a position within a UTF-8 encoded string
 * @end: a pointer to the end of the string, or %NULL to indicate
 *        that the string is nul-terminated, in which case
 *        the returned value will be 
                                    ?????

Also the code seems to be incorrect if end != NULL, end != p, and *p == '\0':

gchar *
g_utf8_find_next_char (const gchar *p,
                       const gchar *end)
{
  if (*p)
    {
      if (end)
        for (++p; p < end && (*p & 0xc0) == 0x80; ++p)
          ;
      else
        for (++p; (*p & 0xc0) == 0x80; ++p)
          ;
    }
  return (p == end) ? NULL : (gchar *)p;
}

In that case, it should return p+1, but it returns p.  Right?

Comment 1 Behdad Esfahbod 2008-08-10 23:15:40 UTC

Also the docs can be more clearn on what @end is.  It's a pointer to the first byte after the end of string, not pointer to the end of string.

Comment 2 acfryx 2008-08-11 00:13:46 UTC

# *        that the string is nul-terminated, in which case
# *        the returned value will be 

The part that says "in which case the returned value will be ???" should be removed because the return value doesn't change only the internal evaluation of it.

@end: a pointer to the "character" following the end of the string

"character" is quoted because it can be an invalid memory address.

Best regards...

Comment 3 Behdad Esfahbod 2008-08-11 03:41:52 UTC

It's the byte after the end of string.  Not character.  Character means Unicode character.

Comment 4 acfryx 2008-08-11 13:17:25 UTC

(In reply to comment #3)
> It's the byte after the end of string.  Not character.  Character means Unicode
> character.
> 

Agreed...

Duo to this character/byte confusion there should be some note explaining this distinction and then use this "notation" throughout all of the documentation. I'll produce a patch for the documentation based on this.

Comment 5 acfryx 2008-08-11 13:43:41 UTC

It was fast to make the patch because that distinction is already made just made small changes where there was some ambiguity.

+ The possible patch to this bug report is there too.

Comment 6 acfryx 2008-08-11 13:44:42 UTC

Created attachment 116342 [details] [review]
gutf8.c Docs patch

Comment 7 Behdad Esfahbod 2008-08-11 17:13:25 UTC

> "the maximum length, in bytes, of @str to use."

It sounds better as:
"the maximum length of @str to use, in bytes."

Comment 8 acfryx 2008-08-11 17:32:17 UTC

(In reply to comment #7)
> > "the maximum length, in bytes, of @str to use."
> 
> It sounds better as:
> "the maximum length of @str to use, in bytes."
> 

I actually think of that form but at the time I prefer to put "in bytes" close to "length", but yes it sounds better.

Following the patch with this change.

Comment 9 acfryx 2008-08-11 17:33:13 UTC

Created attachment 116369 [details] [review]
gutf8.c Docs patch #1

Comment 10 Behdad Esfahbod 2008-08-11 19:03:05 UTC

2008-08-11  Behdad Esfahbod  <behdad@gnome.org>

        Bug 547200 – g_utf8_find_next_char() issues

        * glib/gutf8.c: Improve wording about @end arguments in str funcs.

Comment 11 Behdad Esfahbod 2008-08-11 19:03:46 UTC

Doc issues are fixed.  Leaving open to address the original issue I reported.

Comment 12 Matthias Clasen 2015-10-04 19:29:50 UTC

Created attachment 312649 [details] [review]
Fix a corner-case in g_utf8_find_next_char

In the case that *p is '\0', we should return p + 1, not p.
This change allows to simplify g_utf8_find_next_char a bit.

Comment 13 Behdad Esfahbod 2015-10-05 20:46:12 UTC

sgtm.

Comment 14 Matthias Clasen 2016-07-17 02:14:09 UTC

Comment on attachment 312649 [details] [review]
Fix a corner-case in g_utf8_find_next_char

Attachment 312649 [details] pushed as e0e652e - Fix a corner-case in g_utf8_find_next_char