After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 547200 - g_utf8_find_next_char() issues
g_utf8_find_next_char() issues
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: i18n
unspecified
Other Linux
: Normal minor
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2008-08-10 20:30 UTC by Behdad Esfahbod
Modified: 2016-07-17 22:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gutf8.c Docs patch (2.05 KB, patch)
2008-08-11 13:44 UTC, acfryx
none Details | Review
gutf8.c Docs patch #1 (2.05 KB, patch)
2008-08-11 17:33 UTC, acfryx
committed Details | Review
Fix a corner-case in g_utf8_find_next_char (2.00 KB, patch)
2015-10-04 19:29 UTC, Matthias Clasen
committed Details | Review

Description Behdad Esfahbod 2008-08-10 20:30:54 UTC
The docs have a broken sentence:

/**
 * g_utf8_find_next_char:
 * @p: a pointer to a position within a UTF-8 encoded string
 * @end: a pointer to the end of the string, or %NULL to indicate
 *        that the string is nul-terminated, in which case
 *        the returned value will be 
                                    ?????

Also the code seems to be incorrect if end != NULL, end != p, and *p == '\0':

gchar *
g_utf8_find_next_char (const gchar *p,
                       const gchar *end)
{
  if (*p)
    {
      if (end)
        for (++p; p < end && (*p & 0xc0) == 0x80; ++p)
          ;
      else
        for (++p; (*p & 0xc0) == 0x80; ++p)
          ;
    }
  return (p == end) ? NULL : (gchar *)p;
}

In that case, it should return p+1, but it returns p.  Right?
Comment 1 Behdad Esfahbod 2008-08-10 23:15:40 UTC
Also the docs can be more clearn on what @end is.  It's a pointer to the first byte after the end of string, not pointer to the end of string.
Comment 2 acfryx 2008-08-11 00:13:46 UTC
# *        that the string is nul-terminated, in which case
# *        the returned value will be 

The part that says "in which case the returned value will be ???" should be removed because the return value doesn't change only the internal evaluation of it.

@end: a pointer to the "character" following the end of the string

"character" is quoted because it can be an invalid memory address.

Best regards...
Comment 3 Behdad Esfahbod 2008-08-11 03:41:52 UTC
It's the byte after the end of string.  Not character.  Character means Unicode character.
Comment 4 acfryx 2008-08-11 13:17:25 UTC
(In reply to comment #3)
> It's the byte after the end of string.  Not character.  Character means Unicode
> character.
> 

Agreed...

Duo to this character/byte confusion there should be some note explaining this distinction and then use this "notation" throughout all of the documentation. I'll produce a patch for the documentation based on this.
Comment 5 acfryx 2008-08-11 13:43:41 UTC
It was fast to make the patch because that distinction is already made just made small changes where there was some ambiguity.

+ The possible patch to this bug report is there too.
Comment 6 acfryx 2008-08-11 13:44:42 UTC
Created attachment 116342 [details] [review]
gutf8.c Docs patch
Comment 7 Behdad Esfahbod 2008-08-11 17:13:25 UTC
> "the maximum length, in bytes, of @str to use."

It sounds better as:
"the maximum length of @str to use, in bytes."
Comment 8 acfryx 2008-08-11 17:32:17 UTC
(In reply to comment #7)
> > "the maximum length, in bytes, of @str to use."
> 
> It sounds better as:
> "the maximum length of @str to use, in bytes."
> 

I actually think of that form but at the time I prefer to put "in bytes" close to "length", but yes it sounds better.

Following the patch with this change.
Comment 9 acfryx 2008-08-11 17:33:13 UTC
Created attachment 116369 [details] [review]
gutf8.c Docs patch #1
Comment 10 Behdad Esfahbod 2008-08-11 19:03:05 UTC
2008-08-11  Behdad Esfahbod  <behdad@gnome.org>

        Bug 547200 – g_utf8_find_next_char() issues

        * glib/gutf8.c: Improve wording about @end arguments in str funcs.

Comment 11 Behdad Esfahbod 2008-08-11 19:03:46 UTC
Doc issues are fixed.  Leaving open to address the original issue I reported.
Comment 12 Matthias Clasen 2015-10-04 19:29:50 UTC
Created attachment 312649 [details] [review]
Fix a corner-case in g_utf8_find_next_char

In the case that *p is '\0', we should return p + 1, not p.
This change allows to simplify g_utf8_find_next_char a bit.
Comment 13 Behdad Esfahbod 2015-10-05 20:46:12 UTC
sgtm.
Comment 14 Matthias Clasen 2016-07-17 02:14:09 UTC
Comment on attachment 312649 [details] [review]
Fix a corner-case in g_utf8_find_next_char

Attachment 312649 [details] pushed as e0e652e - Fix a corner-case in g_utf8_find_next_char