GNOME Bugzilla – Bug 790698
convert: test failure on NetBSD
Last modified: 2019-08-30 15:16:30 UTC
On NetBSD, the convert self test fails: PASS: convert 2 /conversion/iconv-state ERROR: convert - too few tests run (expected 7, got 2) ERROR: convert - exited with status 134 (terminated by signal 6?) Running glib/tests/convert directly, I see: /conversion/no-conv: OK /conversion/iconv-state: OK /conversion/illegal-sequence: ** ERROR:convert.c:82:test_one_half: assertion failed (error == (g_convert_error, 1)): error is NULL Abort When I comment out the assertions in line 82, it fails again later in the same function: # ./convert /conversion/no-conv: OK /conversion/iconv-state: OK /conversion/illegal-sequence: ** ERROR:convert.c:98:test_one_half: assertion failed (out == "a"): ("?" == "a") Abort When I comment out these assertions as well, the test program succeeds. AIU, the test wants to convert the ISO-8859-1 or -15 sequence \xc2\xbd to UTF-8 and expects it to fail. I'm not sure why that is so, but on NetBSD it fails differently than expected: The output buffer contains a single question mark "?". NetBSD does not use libiconv but has its own implementation. Perhaps the return value handling is not identical and this leads to that result. The NetBSD man page for iconv() is at http://netbsd.gw.com/cgi-bin/man-cgi?iconv+3+NetBSD-current
Let’s focus on the first issue (failure at convert.c:82) first. Would you be able to run the test under gdb and step through the calls to find out more information? Specifically, I’m interested in: - Number of iterations through the while loop in g_convert_with_iconv() - Values of inbytes_remainin, outp, outbytes_remaining, err, have_error, done, reset, p before and after each call to g_iconv() - Values of error, bytes_read, bytes_written, out after the g_convert() call My first guess is that iconv() on NetBSD is returning a different errno compared to iconv() on other systems, but that’s a guess. Thanks.
Ok, let's see. I set a breakpoint on g_convert_with_iconv. The 'illegal-sequence' test is the interesting one, so I skipped the ones before. Here's the log: The first log should answer your first two questions: (gdb) c Continuing. /conversion/illegal-sequence: Breakpoint 1, g_convert_with_iconv (str=0x4059f5 "\302\275", len=-1, converter=0x73beefb09080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:386 386 { (gdb) n 394 gboolean have_error = FALSE; (gdb) 395 gboolean done = FALSE; (gdb) 396 gboolean reset = FALSE; (gdb) 398 g_return_val_if_fail (converter != (GIConv) -1, NULL); (gdb) 400 if (len < 0) (gdb) 401 len = strlen (str); (gdb) 403 p = str; (gdb) 404 inbytes_remaining = len; (gdb) 405 outbuf_size = len + NUL_TERMINATOR_LENGTH; (gdb) 407 outbytes_remaining = outbuf_size - NUL_TERMINATOR_LENGTH; (gdb) 408 outp = dest = g_malloc (outbuf_size); (gdb) Breakpoint 2, g_convert_with_iconv (str=0x4059f5 "\302\275", len=2, converter=0x73beefb09080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:410 410 while (!done && !have_error) (gdb) 412 if (reset) (gdb) 415 err = g_iconv (converter, (char **)&p, &inbytes_remaining, &outp, &outbytes_remaining); (gdb) p inbytes_remaining $1 = 2 (gdb) p outp $2 = (gchar *) 0x73beef701078 "" (gdb) p outbytes_remaining $3 = 2 (gdb) p err $4 = 4217256 (gdb) p have_error $5 = 0 (gdb) p done $6 = 0 (gdb) p reset $7 = 0 (gdb) p p $8 = (const gchar *) 0x4059f5 "\302\275" (gdb) n 417 if (err == (gsize) -1) (gdb) p inbytes_remaining $9 = 0 (gdb) p outp $10 = (gchar *) 0x73beef701079 "" (gdb) p outbytes_remaining $11 = 1 (gdb) p err $12 = 0 (gdb) p have_error $13 = 0 (gdb) p reset $14 = 0 (gdb) p reset $15 = 0 (gdb) p p $16 = (const gchar *) 0x4059f7 "" (gdb) p *p $17 = 0 '\000' (gdb) p *outp $18 = 0 '\000' (gdb) n 455 if (!reset) (gdb) 458 reset = TRUE; (gdb) 459 inbytes_remaining = 0; (gdb) 410 while (!done && !have_error) (gdb) 412 if (reset) (gdb) 413 err = g_iconv (converter, NULL, &inbytes_remaining, &outp, &outbytes_remaining); (gdb) p inbytes_remaining , outp $19 = (gchar *) 0x73beef701079 "" (gdb) p outp $20 = (gchar *) 0x73beef701079 "" (gdb) p inbytes_remaining $21 = 0 (gdb) p outp $22 = (gchar *) 0x73beef701079 "" (gdb) p outbytes_remaining $23 = 1 (gdb) p err $24 = 0 (gdb) p have_error $25 = 0 (gdb) p done $26 = 0 (gdb) p reset $27 = 1 (gdb) p p $28 = (const gchar *) 0x4059f7 "" (gdb) n 417 if (err == (gsize) -1) (gdb) p inbytes_remaining $29 = 0 (gdb) p outp $30 = (gchar *) 0x73beef701079 "" (gdb) p outbytes_remaining $31 = 1 (gdb) p err $32 = 0 (gdb) p done $33 = 0 (gdb) p have_error $34 = 0 (gdb) p reset $35 = 1 (gdb) p p $36 = (const gchar *) 0x4059f7 "" (gdb) n 455 if (!reset) (gdb) 462 done = TRUE; (gdb) 410 while (!done && !have_error) (gdb) 466 memset (outp, 0, NUL_TERMINATOR_LENGTH); (gdb) 468 if (bytes_read) (gdb) 469 *bytes_read = p - str; (gdb) 483 if (bytes_written) (gdb) 484 *bytes_written = outp - dest; /* Doesn't include '\0' */ (gdb) 486 if (have_error) (gdb) 492 return dest; (gdb) 493 } (gdb) g_convert (str=0x4059f5 "\302\275", len=-1, to_codeset=0x4059f8 "ISO-8859-1", from_codeset=0x4059a8 "UTF-8", bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:569 569 close_converter (cd); (gdb) 571 return res; (gdb) 572 } (gdb) test_one_half () at convert.c:71 71 g_assert_no_error (error); (gdb) 72 g_assert_cmpint (bytes_read, ==, 2); (gdb) 73 g_assert_cmpint (bytes_written, ==, 1); (gdb) 74 g_assert_cmpstr (out, ==, "\xbd"); (gdb) 75 g_free (out); (gdb) 77 out = g_convert (in, -1, (gdb) Breakpoint 1, g_convert_with_iconv (str=0x4059f5 "\302\275", len=-1, converter=0x73beef308080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:386 386 { (gdb) 394 gboolean have_error = FALSE; (gdb) 395 gboolean done = FALSE; (gdb) 396 gboolean reset = FALSE; (gdb) 398 g_return_val_if_fail (converter != (GIConv) -1, NULL); (gdb) 400 if (len < 0) (gdb) 401 len = strlen (str); (gdb) 403 p = str; (gdb) 404 inbytes_remaining = len; (gdb) 405 outbuf_size = len + NUL_TERMINATOR_LENGTH; (gdb) 407 outbytes_remaining = outbuf_size - NUL_TERMINATOR_LENGTH; (gdb) 408 outp = dest = g_malloc (outbuf_size); (gdb) Breakpoint 2, g_convert_with_iconv (str=0x4059f5 "\302\275", len=2, converter=0x73beef308080, bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:410 410 while (!done && !have_error) (gdb) 412 if (reset) (gdb) 415 err = g_iconv (converter, (char **)&p, &inbytes_remaining, &outp, &outbytes_remaining); (gdb) p inbytes_remaining $37 = 2 (gdb) p outp $38 = (gchar *) 0x73beeeb01078 "" (gdb) p outbby No symbol "outbby" in current context. (gdb) p outbytes_remaining $39 = 2 (gdb) p err $40 = 4217256 (gdb) p have_error $41 = 0 (gdb) p done $42 = 0 (gdb) p reset $43 = 0 (gdb) p p $44 = (const gchar *) 0x4059f5 "\302\275" (gdb) n 417 if (err == (gsize) -1) (gdb) p inbytes_remaining $45 = 0 (gdb) p outp $46 = (gchar *) 0x73beeeb01079 "" (gdb) p outbytes_remaining $47 = 1 (gdb) p err $48 = 1 (gdb) p have_error $49 = 0 (gdb) p done $50 = 0 (gdb) p reset $51 = 0 (gdb) p p $52 = (const gchar *) 0x4059f7 "" (gdb) p out[-1] No symbol "out" in current context. (gdb) p outp[-1] $53 = 63 '?' (gdb) n 455 if (!reset) (gdb) 458 reset = TRUE; (gdb) 459 inbytes_remaining = 0; (gdb) 410 while (!done && !have_error) (gdb) 412 if (reset) (gdb) 413 err = g_iconv (converter, NULL, &inbytes_remaining, &outp, &outbytes_remaining); (gdb) 417 if (err == (gsize) -1) (gdb) p inbytes_remaining $54 = 0 (gdb) p outp $55 = (gchar *) 0x73beeeb01079 "" (gdb) p outbytes_remaining $56 = 1 (gdb) p err $57 = 0 (gdb) p have_error $58 = 0 (gdb) p done $59 = 0 (gdb) p reset $60 = 1 (gdb) p p $61 = (const gchar *) 0x4059f7 "" (gdb) n 455 if (!reset) (gdb) 462 done = TRUE; (gdb) 410 while (!done && !have_error) (gdb) 466 memset (outp, 0, NUL_TERMINATOR_LENGTH); (gdb) 468 if (bytes_read) (gdb) 469 *bytes_read = p - str; (gdb) 483 if (bytes_written) (gdb) 484 *bytes_written = outp - dest; /* Doesn't include '\0' */ (gdb) 486 if (have_error) (gdb) 492 return dest; (gdb) 493 } (gdb) g_convert (str=0x4059f5 "\302\275", len=-1, to_codeset=0x405a36 "ISO-8859-15", from_codeset=0x4059a8 "UTF-8", bytes_read=0x7f7fffecb990, bytes_written=0x7f7fffecb998, error=0x7f7fffecb9a0) at gconvert.c:569 569 close_converter (cd); (gdb) 571 return res; (gdb) 572 } (gdb) test_one_half () at convert.c:82 82 g_assert_error (error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE); (gdb) Breakpoint 1, g_convert_with_iconv (str=0x73beeeb020f0 "**\nERROR:convert.c:82:test_one_half: assertion failed (error == (g_convert_error, 1)): error is NULL\n", len=101, converter=0x73beeeb06080, bytes_read=0x0, bytes_written=0x0, error=0x7f7fffecb650) at gconvert.c:386 386 { (gdb) And this hopefully answers the last one: (gdb) test_one_half () at convert.c:71 71 g_assert_no_error (error); (gdb) l - 66 out = g_convert (in, -1, 67 "ISO-8859-1", "UTF-8", 68 &bytes_read, &bytes_written, 69 &error); 70 71 g_assert_no_error (error); 72 g_assert_cmpint (bytes_read, ==, 2); 73 g_assert_cmpint (bytes_written, ==, 1); 74 g_assert_cmpstr (out, ==, "\xbd"); 75 g_free (out); (gdb) p error $62 = (GError *) 0x0 (gdb) p bytes_read $63 = 2 (gdb) p bytes_written $64 = 1 (gdb) Does this help?
Thanks, that’s very helpful. It looks like NetBSD’s iconv() is successfully converting the fraction character from UTF-8 to ISO-8859-1 and ISO-8859-15, even though it’s not representable in ISO-8859-15. It’s being replaced with a ‘?’ in that case. That’s actually allowed in the specification for the iconv() function, but it’s a case which GLib wasn’t handling, since the glibc implementation of iconv() always returns EILSEQ in such a situation. I’m about to attach some patches which should fix this. Can you apply them to your copy of GLib and test please?
Created attachment 367210 [details] [review] docs: Replace an XML entity with a UTF-8 character instead Another part of the long tail of converting our documentation from DocBook to Markdown. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 367211 [details] [review] docs: Minor wording improvements in GConvert documentation Fix capitalisation of GLib, make some text less gender-specific, and add some missing colons. Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 367212 [details] [review] tests: Add a missing const to a variable in the GConvert tests Also rename it to make it clearer how it’s encoded (as UTF-8). Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 367213 [details] [review] gconvert: Fix error handling for g_iconv() with unrepresentable chars The behaviour of upstream iconv() when faced with a character which is valid in the input encoding, but not representable in the output encoding, is implementation defined: http://pubs.opengroup.org/onlinepubs/9699919799/ Specifically: If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character. This behaviour was being exposed in our g_iconv() wrapper and also in g_convert_with_iconv() — but users of g_convert_with_iconv() (both the GLib unit tests, and the implementation of g_convert_with_fallback()) were assuming that iconv() would return EILSEQ if faced with an unrepresentable character. On platforms like NetBSD, this is not the case: NetBSD’s iconv() finishes the conversion successfully, and outputs a string containing replacement characters. It signals those replacements in its return value from iconv(), which is positive (specifically, non-zero) in such a case. Let’s codify the existing assumed behaviour of g_convert_with_iconv(), documenting that it will return G_CONVERT_ERROR_INVALID_SEQUENCE if faced with an unrepresentable character. As g_iconv() is a thin wrapper around iconv(), leave the behaviour there implementation-defined (but document it as such). Signed-off-by: Philip Withnall <withnall@endlessm.com>
Created attachment 367214 [details] [review] tests: Add some documentation to the illegal sequence conversion test Signed-off-by: Philip Withnall <withnall@endlessm.com>
*** Bug 317754 has been marked as a duplicate of this bug. ***
Review of attachment 367210 [details] [review]: look fine
Review of attachment 367211 [details] [review]: sure
Review of attachment 367212 [details] [review]: ok
Review of attachment 367213 [details] [review]: probably the best we can do
Review of attachment 367214 [details] [review]: yes
Thanks for the reviews! Attachment 367210 [details] pushed as ad6afd0 - docs: Replace an XML entity with a UTF-8 character instead Attachment 367211 [details] pushed as 19bc03e - docs: Minor wording improvements in GConvert documentation Attachment 367212 [details] pushed as a19eed4 - tests: Add a missing const to a variable in the GConvert tests Attachment 367213 [details] pushed as 8abf3a0 - gconvert: Fix error handling for g_iconv() with unrepresentable chars Attachment 367214 [details] pushed as 3a88ab6 - tests: Add some documentation to the illegal sequence conversion test
I tested git head from a couple minutes ago, which includes your patches, on NetBSD 8.99.12/amd64. I see 4 test failures, one of which is in convert. This is the content of convert.log: # random seed: R02Se2868f07c5ab293548fcff2fc632b37b 1..12 # Start of conversion tests ok 1 /conversion/no-conv PASS: convert 1 /conversion/no-conv ** ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0) ok 2 /conversion/iconv-state PASS: convert 2 /conversion/iconv-state [1] Abort trap (core dumped) ${1} -k --tap Bail out! ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0) ERROR: convert - Bail out! ERROR:../../../glib/tests/convert.c:86:test_one_half: assertion failed (bytes_read == 0): (2 == 0) Please let me know what information you need to debug this. Thanks.
I think I know what the problem is here: in gconvert.c, while we handle (err > 0) to give ERROR_ILLEGAL_SEQUENCE, we don’t calculate a correct value of *bytes_read to output with that error. It’s set to the entire length of the input string (in bytes), because the NetBSD iconv() *did* read the entire string (it just outputted replacement characters for all of it). The fix here is to set *bytes_read differently if (err > 0). We’d probably want to set it to the offset of the first replacement character in the output string. That could be hard to calculate, though, since the input string might already contain replacement characters (which are typically U+003F ‘?’). We have to make this change, as g_convert() documents that @bytes_read “will be the byte offset after the last valid input sequence” in that case. This has fallen down my priority queue, so if someone else wants to pick it up; feel free. Otherwise, I’ll pick it up again at some point in the future.
*** Bug 674540 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/1303.