GNOME Bugzilla – Bug 547200
g_utf8_find_next_char() issues
Last modified: 2016-07-17 22:29:17 UTC
The docs have a broken sentence: /** * g_utf8_find_next_char: * @p: a pointer to a position within a UTF-8 encoded string * @end: a pointer to the end of the string, or %NULL to indicate * that the string is nul-terminated, in which case * the returned value will be ????? Also the code seems to be incorrect if end != NULL, end != p, and *p == '\0': gchar * g_utf8_find_next_char (const gchar *p, const gchar *end) { if (*p) { if (end) for (++p; p < end && (*p & 0xc0) == 0x80; ++p) ; else for (++p; (*p & 0xc0) == 0x80; ++p) ; } return (p == end) ? NULL : (gchar *)p; } In that case, it should return p+1, but it returns p. Right?
Also the docs can be more clearn on what @end is. It's a pointer to the first byte after the end of string, not pointer to the end of string.
# * that the string is nul-terminated, in which case # * the returned value will be The part that says "in which case the returned value will be ???" should be removed because the return value doesn't change only the internal evaluation of it. @end: a pointer to the "character" following the end of the string "character" is quoted because it can be an invalid memory address. Best regards...
It's the byte after the end of string. Not character. Character means Unicode character.
(In reply to comment #3) > It's the byte after the end of string. Not character. Character means Unicode > character. > Agreed... Duo to this character/byte confusion there should be some note explaining this distinction and then use this "notation" throughout all of the documentation. I'll produce a patch for the documentation based on this.
It was fast to make the patch because that distinction is already made just made small changes where there was some ambiguity. + The possible patch to this bug report is there too.
Created attachment 116342 [details] [review] gutf8.c Docs patch
> "the maximum length, in bytes, of @str to use." It sounds better as: "the maximum length of @str to use, in bytes."
(In reply to comment #7) > > "the maximum length, in bytes, of @str to use." > > It sounds better as: > "the maximum length of @str to use, in bytes." > I actually think of that form but at the time I prefer to put "in bytes" close to "length", but yes it sounds better. Following the patch with this change.
Created attachment 116369 [details] [review] gutf8.c Docs patch #1
2008-08-11 Behdad Esfahbod <behdad@gnome.org> Bug 547200 – g_utf8_find_next_char() issues * glib/gutf8.c: Improve wording about @end arguments in str funcs.
Doc issues are fixed. Leaving open to address the original issue I reported.
Created attachment 312649 [details] [review] Fix a corner-case in g_utf8_find_next_char In the case that *p is '\0', we should return p + 1, not p. This change allows to simplify g_utf8_find_next_char a bit.
sgtm.
Comment on attachment 312649 [details] [review] Fix a corner-case in g_utf8_find_next_char Attachment 312649 [details] pushed as e0e652e - Fix a corner-case in g_utf8_find_next_char