After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 555285 - g_utf8_validate() forbids embedded NUL
g_utf8_validate() forbids embedded NUL
Status: RESOLVED WONTFIX
Product: glib
Classification: Platform
Component: general
2.16.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2008-10-06 20:14 UTC by coda
Modified: 2008-10-08 23:47 UTC
See Also:
GNOME target: ---
GNOME version: 2.21/2.22



Description coda 2008-10-06 20:14:25 UTC
Please describe the problem:
g_utf8_validate() returns FALSE if the string contains a NUL. There is no reason provided for this in the surrounding comments, and U+0000 is a valid character with UTF-8 representation 0x0 (as it's part of ASCII.) All other Unicode codepoints are allowed.

Steps to reproduce:


Actual results:


Expected results:
g_utf8_validate() should allow NULs in a string.

Does this happen every time?


Other information:
Comment 1 Behdad Esfahbod 2008-10-06 22:08:41 UTC
Right.  I was surprised when I noticed this too.  But it's too much to change.  It's all a mess...

In Pango I work around it by doing:

  start = layout->text;
  for (;;) {
    gboolean valid;

    valid = g_utf8_validate (start, -1, (const char **)&end);

    if (!*end)
      break;

    /* Replace invalid bytes with -1.  The -1 will be converted to
     * ((gunichar) -1) by glib, and that in turn yields a glyph value of
     * ((PangoGlyph) -1) by PANGO_GET_UNKNOWN_GLYPH(-1),
     * and that's PANGO_GLYPH_INVALID_INPUT.
     */
    if (!valid)
      *end++ = -1;

    start = end;
  }

So I simply ignore errors caused by NUL bytes.
Comment 2 coda 2008-10-07 00:22:11 UTC
Unfortunately, it can't always be worked around that simply. Replacing bytes in the original string either corrupts data or causes it to be no longer UTF-8. I think there are several possible solutions to fix the mess.

I started a discussion on the mailing list.
http://article.gmane.org/gmane.comp.gnome.gtk%2B.devel.general/16024
Comment 3 Matthias Clasen 2008-10-08 23:47:36 UTC
Whatever your opinion on this, it cannot be changed in glib at this point.

Write your own utf8 validation function if you need one that accepts NUL