After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 674540 - Incorrect assertion in gconvert tests
Incorrect assertion in gconvert tests
Status: RESOLVED DUPLICATE of bug 790698
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal minor
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2012-04-21 19:20 UTC by bugdal
Modified: 2018-02-16 13:01 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description bugdal 2012-04-21 19:20:03 UTC
The second test in glib/tests/convert.c (line 79) asserts that conversion from UTF-8 to ISO-8859-15 of a character not in the destination charset fails with G_CONVERT_ERROR_ILLEGAL_SEQUENCE. It's unclear whether glib *intends* for this to be the correct behavior of g_convert, but since g_convert is built on iconv, and only fails with G_CONVERT_ERROR_ILLEGAL_SEQUENCE when iconv fails with EILSEQ, this test case should not fail but succeed. Per POSIX:

"If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character."

(Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv.html)

Moreover, the number of such inexact conversions is returned as the iconv function's return value.

The current test in glib/tests/convert.c seems to be testing for buggy, non-conformant GNU iconv behavior.

If gconv wants to match the GNU iconv behavior in this regard, glib/gconvert.c's iconv wrapper functions need to be updated to check for a positive return value from iconv (indicating inexact conversion) and somehow translate this into an illegal sequence error. Note that if you want to give the caller the *location* at which the inexact conversion occurred, there is no easy way to do that; the information is not preserved when iconv returns. The easiest solution I know of is to give iconv one byte at a time until the inexact conversion occurs.

Personally I think it would be more useful to simply add a new error condition indicating that the text is not representable in the destination charset, and that the contents of the destination string are likely to contain junk. However, doing this consistently would require working around the buggy GNU iconv implementation. The easiest way I know to do that is to check for EILSEQ, and retry the conversion with WCHAR_T as the destination charset. If EILSEQ no longer happens, it was a spurious EILSEQ caused by the bug. However, this seems like quite a burden when supporting stateful converters, since there's no way to restart the conversion if there was an initial state.

In any case, even if you don't want to address this ugly issue in glib, the test should be fixed not to assert a condition that will only be true when the underlying system libraries are buggy.
Comment 1 Dan Winship 2012-04-22 13:42:12 UTC
The glibc docs claim that this case is undefined. (Maybe it was undefined in an older version of POSIX and only later got changed to "implementation-defined conversion"?) They also say that glibc's behavior may change in a future release. And the gconvert docs only say that they'll return G_CONVERT_ERROR_ILLEGAL_SEQUENCE in the illegal-input-character case, not in the can't-convert case. So we should probably remove this test, and note in the docs that glib's behavior here is undefined.
Comment 2 bugdal 2012-04-22 15:39:41 UTC
Right now, when the underlying iconv behaves as specified by POSIX and performs the implementation-defined non-exact conversions, glib completely fails to report that to the caller; the conversion appears as a success. This is probably undesirable. I think glib should return _some_ kind of error code in this case to inform the caller. If the caller is trying to choose an output encoding for sending an email or instant message or similar, a false "success" return when using an insufficient encoding is very harmful, as it will result in information loss.

Here's what I would propose:

- Have positive return values from iconv cause g_convert to generate an error (either G_CONVERT_ERROR_ILLEGAL_SEQUENCE or some other code).

- Deprecate the practice of applications using the "input bytes consumed" result value when g_convert has encountered an error. If the input resulted in an early inexact conversion followed by a later encoding error, the fact that the inexact conversion took place at all could be lost, and the application would wrongly assume everything up to the "input bytes consumed" count had been converted correctly.

In short, due to the inconsistency in implementations and the awkward POSIX-specified behavior that loses information, any nonzero return value from iconv usually needs to be treated as an entirely unrecoverable status, from which recovery entails restarting the conversion from the beginning and performing it byte-by-byte.

Apologies if the follow-up to my original bug report is not the right place for this discussion; perhaps it should be a new bug report filed against the actual behavior...
Comment 3 Dan Winship 2012-04-25 15:20:10 UTC
Related: g_convert_with_fallback() only works if iconv() has the glibc semantics. (Well... it produces an approximate conversion either way, but if iconv() doesn't have glibc semantics then it uses an implementation-defined fallback string rather than using the provided one.)
Comment 4 Philip Withnall 2018-02-16 13:01:34 UTC
This has been partially fixed as commit 8abf3a04e699abd486c4dcaa57977203584acf0e; see bug #790698.

*** This bug has been marked as a duplicate of bug 790698 ***