After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 790698 - convert: test failure on NetBSD
convert: test failure on NetBSD
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: i18n
2.54.x
Other NetBSD
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
: 317754 674540 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2017-11-22 08:44 UTC by Thomas Klausner
Modified: 2019-08-30 15:16 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
docs: Replace an XML entity with a UTF-8 character instead (1.09 KB, patch)
2018-01-22 12:57 UTC, Philip Withnall
committed Details | Review
docs: Minor wording improvements in GConvert documentation (3.32 KB, patch)
2018-01-22 12:57 UTC, Philip Withnall
committed Details | Review
tests: Add a missing const to a variable in the GConvert tests (1.58 KB, patch)
2018-01-22 12:57 UTC, Philip Withnall
committed Details | Review
gconvert: Fix error handling for g_iconv() with unrepresentable chars (4.70 KB, patch)
2018-01-22 12:57 UTC, Philip Withnall
committed Details | Review
tests: Add some documentation to the illegal sequence conversion test (924 bytes, patch)
2018-01-22 12:58 UTC, Philip Withnall
committed Details | Review

Description Thomas Klausner 2017-11-22 08:44:39 UTC
On NetBSD, the convert self test fails:

PASS: convert 2 /conversion/iconv-state
ERROR: convert - too few tests run (expected 7, got 2)
ERROR: convert - exited with status 134 (terminated by signal 6?)

Running glib/tests/convert directly, I see:

/conversion/no-conv: OK
/conversion/iconv-state: OK
/conversion/illegal-sequence: **
ERROR:convert.c:82:test_one_half: assertion failed (error == (g_convert_error, 1)): error is NULL
Abort


When I comment out the assertions in line 82, it fails again later in the same function:

# ./convert
/conversion/no-conv: OK
/conversion/iconv-state: OK
/conversion/illegal-sequence: **
ERROR:convert.c:98:test_one_half: assertion failed (out == "a"): ("?" == "a")
Abort


When I comment out these assertions as well, the test program succeeds.

AIU, the test wants to convert the ISO-8859-1 or -15 sequence \xc2\xbd to UTF-8 and expects it to fail. I'm not sure why that is so, but on NetBSD it fails differently than expected: The output buffer contains a single question mark "?".

NetBSD does not use libiconv but has its own implementation. Perhaps the return value handling is not identical and this leads to that result.

The NetBSD man page for iconv() is at 
http://netbsd.gw.com/cgi-bin/man-cgi?iconv+3+NetBSD-current
Comment 1 Philip Withnall 2017-12-04 11:03:39 UTC
Let’s focus on the first issue (failure at convert.c:82) first.

Would you be able to run the test under gdb and step through the calls to find out more information? Specifically, I’m interested in:
 - Number of iterations through the while loop in g_convert_with_iconv()
 - Values of inbytes_remainin, outp, outbytes_remaining, err, have_error, done, reset, p before and after each call to g_iconv()
 - Values of error, bytes_read, bytes_written, out after the g_convert() call

My first guess is that iconv() on NetBSD is returning a different errno compared to iconv() on other systems, but that’s a guess.

Thanks.
Comment 2 Thomas Klausner 2017-12-24 15:46:49 UTC
Ok, let's see.
I set a breakpoint on g_convert_with_iconv. The 'illegal-sequence' test is the interesting one, so I skipped the ones before. Here's the log:

The first log should answer your first two questions:


(gdb) c
Continuing.
/conversion/illegal-sequence:
Breakpoint 1, g_convert_with_iconv (str=0x4059f5 "\302\275", len=-1, converter=0x73beefb09080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0)
    at gconvert.c:386
386     {
(gdb) n
394       gboolean have_error = FALSE;
(gdb)
395       gboolean done = FALSE;
(gdb)
396       gboolean reset = FALSE;
(gdb)
398       g_return_val_if_fail (converter != (GIConv) -1, NULL);
(gdb)
400       if (len < 0)
(gdb)
401         len = strlen (str);
(gdb)
403       p = str;
(gdb)
404       inbytes_remaining = len;
(gdb)
405       outbuf_size = len + NUL_TERMINATOR_LENGTH;
(gdb)
407       outbytes_remaining = outbuf_size - NUL_TERMINATOR_LENGTH;
(gdb)
408       outp = dest = g_malloc (outbuf_size);
(gdb)

Breakpoint 2, g_convert_with_iconv (str=0x4059f5 "\302\275", len=2, converter=0x73beefb09080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0)
    at gconvert.c:410
410       while (!done && !have_error)
(gdb)
412           if (reset)
(gdb)
415             err = g_iconv (converter, (char **)&p, &inbytes_remaining, &outp, &outbytes_remaining);
(gdb) p inbytes_remaining
$1 = 2
(gdb) p outp
$2 = (gchar *) 0x73beef701078 ""
(gdb) p outbytes_remaining
$3 = 2
(gdb) p err
$4 = 4217256
(gdb) p have_error
$5 = 0
(gdb) p done
$6 = 0
(gdb) p reset
$7 = 0
(gdb) p p
$8 = (const gchar *) 0x4059f5 "\302\275"
(gdb) n
417           if (err == (gsize) -1)
(gdb) p inbytes_remaining
$9 = 0
(gdb) p outp
$10 = (gchar *) 0x73beef701079 ""
(gdb) p outbytes_remaining
$11 = 1
(gdb) p err
$12 = 0
(gdb) p have_error
$13 = 0
(gdb) p reset
$14 = 0
(gdb) p reset
$15 = 0
(gdb) p p
$16 = (const gchar *) 0x4059f7 ""
(gdb) p *p
$17 = 0 '\000'
(gdb) p *outp
$18 = 0 '\000'
(gdb) n
455               if (!reset)
(gdb)
458                   reset = TRUE;
(gdb)
459                   inbytes_remaining = 0;
(gdb)
410       while (!done && !have_error)
(gdb)
412           if (reset)
(gdb)
413             err = g_iconv (converter, NULL, &inbytes_remaining, &outp, &outbytes_remaining);
(gdb) p inbytes_remaining , outp
$19 = (gchar *) 0x73beef701079 ""
(gdb) p outp
$20 = (gchar *) 0x73beef701079 ""
(gdb) p inbytes_remaining
$21 = 0
(gdb) p outp
$22 = (gchar *) 0x73beef701079 ""
(gdb) p outbytes_remaining
$23 = 1
(gdb) p err
$24 = 0
(gdb) p have_error
$25 = 0
(gdb) p done
$26 = 0
(gdb) p reset
$27 = 1
(gdb) p p
$28 = (const gchar *) 0x4059f7 ""
(gdb) n
417           if (err == (gsize) -1)
(gdb) p inbytes_remaining
$29 = 0
(gdb) p outp
$30 = (gchar *) 0x73beef701079 ""
(gdb) p outbytes_remaining
$31 = 1
(gdb) p err
$32 = 0
(gdb) p done
$33 = 0
(gdb) p have_error
$34 = 0
(gdb) p reset
$35 = 1
(gdb) p p
$36 = (const gchar *) 0x4059f7 ""
(gdb) n
455               if (!reset)
(gdb)
462                 done = TRUE;
(gdb)
410       while (!done && !have_error)
(gdb)
466       memset (outp, 0, NUL_TERMINATOR_LENGTH);
(gdb)
468       if (bytes_read)
(gdb)
469         *bytes_read = p - str;
(gdb)
483       if (bytes_written)
(gdb)
484         *bytes_written = outp - dest;       /* Doesn't include '\0' */
(gdb)
486       if (have_error)
(gdb)
492         return dest;
(gdb)
493     }
(gdb)
g_convert (str=0x4059f5 "\302\275", len=-1, to_codeset=0x4059f8 "ISO-8859-1", from_codeset=0x4059a8 "UTF-8", bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998,
    error=0x7f7fffecb9a0) at gconvert.c:569
569       close_converter (cd);
(gdb)
571       return res;
(gdb)
572     }
(gdb)
test_one_half () at convert.c:71
71        g_assert_no_error (error);
(gdb)
72        g_assert_cmpint (bytes_read, ==, 2);
(gdb)
73        g_assert_cmpint (bytes_written, ==, 1);
(gdb)
74        g_assert_cmpstr (out, ==, "\xbd");
(gdb)
75        g_free (out);
(gdb)
77        out = g_convert (in, -1,
(gdb)

Breakpoint 1, g_convert_with_iconv (str=0x4059f5 "\302\275", len=-1, converter=0x73beef308080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0)
    at gconvert.c:386
386     {
(gdb)
394       gboolean have_error = FALSE;
(gdb)
395       gboolean done = FALSE;
(gdb)
396       gboolean reset = FALSE;
(gdb)
398       g_return_val_if_fail (converter != (GIConv) -1, NULL);
(gdb)
400       if (len < 0)
(gdb)
401         len = strlen (str);
(gdb)
403       p = str;
(gdb)
404       inbytes_remaining = len;
(gdb)
405       outbuf_size = len + NUL_TERMINATOR_LENGTH;
(gdb)
407       outbytes_remaining = outbuf_size - NUL_TERMINATOR_LENGTH;
(gdb)
408       outp = dest = g_malloc (outbuf_size);
(gdb)

Breakpoint 2, g_convert_with_iconv (str=0x4059f5 "\302\275", len=2, converter=0x73beef308080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0)
    at gconvert.c:410
410       while (!done && !have_error)
(gdb)
412           if (reset)
(gdb)
415             err = g_iconv (converter, (char **)&p, &inbytes_remaining, &outp, &outbytes_remaining);
(gdb) p inbytes_remaining
$37 = 2
(gdb) p outp
$38 = (gchar *) 0x73beeeb01078 ""
(gdb) p outbby
No symbol "outbby" in current context.
(gdb) p outbytes_remaining
$39 = 2
(gdb) p err
$40 = 4217256
(gdb) p have_error
$41 = 0
(gdb) p done
$42 = 0
(gdb) p reset
$43 = 0
(gdb) p p
$44 = (const gchar *) 0x4059f5 "\302\275"
(gdb) n
417           if (err == (gsize) -1)
(gdb) p inbytes_remaining
$45 = 0
(gdb) p outp
$46 = (gchar *) 0x73beeeb01079 ""
(gdb) p outbytes_remaining
$47 = 1
(gdb) p err
$48 = 1
(gdb) p have_error
$49 = 0
(gdb) p done
$50 = 0
(gdb) p reset
$51 = 0
(gdb) p p
$52 = (const gchar *) 0x4059f7 ""
(gdb) p out[-1]
No symbol "out" in current context.
(gdb) p outp[-1]
$53 = 63 '?'
(gdb) n
455               if (!reset)
(gdb)
458                   reset = TRUE;
(gdb)
459                   inbytes_remaining = 0;
(gdb)
410       while (!done && !have_error)
(gdb)
412           if (reset)
(gdb)
413             err = g_iconv (converter, NULL, &inbytes_remaining, &outp, &outbytes_remaining);
(gdb)
417           if (err == (gsize) -1)
(gdb) p inbytes_remaining
$54 = 0
(gdb) p outp
$55 = (gchar *) 0x73beeeb01079 ""
(gdb) p outbytes_remaining
$56 = 1
(gdb) p err
$57 = 0
(gdb) p have_error
$58 = 0
(gdb) p done
$59 = 0
(gdb) p reset
$60 = 1
(gdb) p p
$61 = (const gchar *) 0x4059f7 ""
(gdb) n
455               if (!reset)
(gdb)
462                 done = TRUE;
(gdb)
410       while (!done && !have_error)
(gdb)
466       memset (outp, 0, NUL_TERMINATOR_LENGTH);
(gdb)
468       if (bytes_read)
(gdb)
469         *bytes_read = p - str;
(gdb)
483       if (bytes_written)
(gdb)
484         *bytes_written = outp - dest;       /* Doesn't include '\0' */
(gdb)
486       if (have_error)
(gdb)
492         return dest;
(gdb)
493     }
(gdb)
g_convert (str=0x4059f5 "\302\275", len=-1, to_codeset=0x405a36 "ISO-8859-15", from_codeset=0x4059a8 "UTF-8", bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998,
    error=0x7f7fffecb9a0) at gconvert.c:569
569       close_converter (cd);
(gdb)
571       return res;
(gdb)
572     }
(gdb)
test_one_half () at convert.c:82
82        g_assert_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE);
(gdb)

Breakpoint 1, g_convert_with_iconv (str=0x73beeeb020f0 "**\nERROR:convert.c:82:test_one_half: assertion failed (error == (g_convert_error, 1)): error is NULL\n", len=101,
    converter=0x73beeeb06080, bytes_read=0x0, bytes_written=0x0, error=0x7f7fffecb650) at gconvert.c:386
386     {
(gdb)


And this hopefully answers the last one:

(gdb) 
test_one_half () at convert.c:71
71        g_assert_no_error (error);
(gdb) l -
66        out = g_convert (in, -1, 
67                         "ISO-8859-1", "UTF-8",
68                         &bytes_read, &bytes_written,
69                         &error);
70
71        g_assert_no_error (error);
72        g_assert_cmpint (bytes_read, ==, 2);
73        g_assert_cmpint (bytes_written, ==, 1);
74        g_assert_cmpstr (out, ==, "\xbd");
75        g_free (out);
(gdb) p error
$62 = (GError *) 0x0
(gdb) p bytes_read 
$63 = 2
(gdb) p bytes_written
$64 = 1
(gdb) 

Does this help?
Comment 3 Philip Withnall 2018-01-22 12:57:09 UTC
Thanks, that’s very helpful. It looks like NetBSD’s iconv() is successfully converting the fraction character from UTF-8 to ISO-8859-1 and ISO-8859-15, even though it’s not representable in ISO-8859-15. It’s being replaced with a ‘?’ in that case.

That’s actually allowed in the specification for the iconv() function, but it’s a case which GLib wasn’t handling, since the glibc implementation of iconv() always returns EILSEQ in such a situation.

I’m about to attach some patches which should fix this. Can you apply them to your copy of GLib and test please?
Comment 4 Philip Withnall 2018-01-22 12:57:38 UTC
Created attachment 367210 [details] [review]
docs: Replace an XML entity with a UTF-8 character instead

Another part of the long tail of converting our documentation from
DocBook to Markdown.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 5 Philip Withnall 2018-01-22 12:57:44 UTC
Created attachment 367211 [details] [review]
docs: Minor wording improvements in GConvert documentation

Fix capitalisation of GLib, make some text less gender-specific, and add
some missing colons.

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 6 Philip Withnall 2018-01-22 12:57:50 UTC
Created attachment 367212 [details] [review]
tests: Add a missing const to a variable in the GConvert tests

Also rename it to make it clearer how it’s encoded (as UTF-8).

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 7 Philip Withnall 2018-01-22 12:57:56 UTC
Created attachment 367213 [details] [review]
gconvert: Fix error handling for g_iconv() with unrepresentable chars

The behaviour of upstream iconv() when faced with a character which is
valid in the input encoding, but not representable in the output
encoding, is implementation defined:

http://pubs.opengroup.org/onlinepubs/9699919799/

Specifically:

   If iconv() encounters a character in the input buffer that is valid,
   but for which an identical character does not exist in the target
   codeset, iconv() shall perform an implementation-defined conversion
   on this character.

This behaviour was being exposed in our g_iconv() wrapper and also in
g_convert_with_iconv() — but users of g_convert_with_iconv() (both the
GLib unit tests, and the implementation of g_convert_with_fallback())
were assuming that iconv() would return EILSEQ if faced with an
unrepresentable character.

On platforms like NetBSD, this is not the case: NetBSD’s iconv()
finishes the conversion successfully, and outputs a string containing
replacement characters. It signals those replacements in its return
value from iconv(), which is positive (specifically, non-zero) in such a
case.

Let’s codify the existing assumed behaviour of g_convert_with_iconv(),
documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if
faced with an unrepresentable character. As g_iconv() is a thin wrapper
around iconv(), leave the behaviour there implementation-defined (but
document it as such).

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 8 Philip Withnall 2018-01-22 12:58:02 UTC
Created attachment 367214 [details] [review]
tests: Add some documentation to the illegal sequence conversion test

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 9 Philip Withnall 2018-02-01 14:15:35 UTC
*** Bug 317754 has been marked as a duplicate of this bug. ***
Comment 10 Matthias Clasen 2018-02-02 08:42:38 UTC
Review of attachment 367210 [details] [review]:

look fine
Comment 11 Matthias Clasen 2018-02-02 08:43:22 UTC
Review of attachment 367211 [details] [review]:

sure
Comment 12 Matthias Clasen 2018-02-02 08:43:52 UTC
Review of attachment 367212 [details] [review]:

ok
Comment 13 Matthias Clasen 2018-02-02 08:45:56 UTC
Review of attachment 367213 [details] [review]:

probably the best we can do
Comment 14 Matthias Clasen 2018-02-02 08:47:43 UTC
Review of attachment 367214 [details] [review]:

yes
Comment 15 Philip Withnall 2018-02-02 09:04:57 UTC
Thanks for the reviews!

Attachment 367210 [details] pushed as ad6afd0 - docs: Replace an XML entity with a UTF-8 character instead
Attachment 367211 [details] pushed as 19bc03e - docs: Minor wording improvements in GConvert documentation
Attachment 367212 [details] pushed as a19eed4 - tests: Add a missing const to a variable in the GConvert tests
Attachment 367213 [details] pushed as 8abf3a0 - gconvert: Fix error handling for g_iconv() with unrepresentable chars
Attachment 367214 [details] pushed as 3a88ab6 - tests: Add some documentation to the illegal sequence conversion test
Comment 16 Thomas Klausner 2018-02-02 15:34:56 UTC
I tested git head from a couple minutes ago, which includes your patches, on NetBSD 8.99.12/amd64.

I see 4 test failures, one of which is in convert. This is the content of convert.log:

# random seed: R02Se2868f07c5ab293548fcff2fc632b37b
1..12
# Start of conversion tests
ok 1 /conversion/no-conv
PASS: convert 1 /conversion/no-conv
**
ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0)
ok 2 /conversion/iconv-state
PASS: convert 2 /conversion/iconv-state
[1]   Abort trap (core dumped) ${1} -k --tap
Bail out! ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0)
ERROR: convert - Bail out! ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0)

Please let me know what information you need to debug this. Thanks.
Comment 17 Philip Withnall 2018-02-13 14:43:10 UTC
I think I know what the problem is here: in gconvert.c, while we handle (err > 0) to give ERROR_ILLEGAL_SEQUENCE, we don’t calculate a correct value of *bytes_read to output with that error. It’s set to the entire length of the input string (in bytes), because the NetBSD iconv() *did* read the entire string (it just outputted replacement characters for all of it).

The fix here is to set *bytes_read differently if (err > 0). We’d probably want to set it to the offset of the first replacement character in the output string. That could be hard to calculate, though, since the input string might already contain replacement characters (which are typically U+003F ‘?’).

We have to make this change, as g_convert() documents that @bytes_read “will be the byte offset after the last valid input sequence” in that case.

This has fallen down my priority queue, so if someone else wants to pick it up; feel free. Otherwise, I’ll pick it up again at some point in the future.
Comment 18 Philip Withnall 2018-02-16 13:01:34 UTC
*** Bug 674540 has been marked as a duplicate of this bug. ***
Comment 19 GNOME Infrastructure Team 2018-05-24 19:54:28 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1303.