After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 311337 - g_convert_with_iconv Converts incorrectly cp1255
g_convert_with_iconv Converts incorrectly cp1255
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.6.x
Other All
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2005-07-23 16:01 UTC by duv
Modified: 2011-02-18 16:11 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a patch (1.16 KB, patch)
2005-08-01 19:51 UTC, Matthias Clasen
none Details | Review
new version (1.71 KB, patch)
2005-08-01 20:42 UTC, Matthias Clasen
none Details | Review
new version (11.26 KB, patch)
2005-08-02 06:30 UTC, Matthias Clasen
none Details | Review
final patch as committed (11.27 KB, patch)
2005-08-02 15:06 UTC, Matthias Clasen
committed Details | Review

Description duv 2005-07-23 16:01:17 UTC
Please describe the problem:
When converting CP1255 to UTF-8, the last letter in the output buffer is
missing, if len is passed without couting the terminating null.

A five letter cp1255 will be converted to 8 utf-8 bytes if the length is 5 or
-1, but will be converted correctly (to 11 bytes) if len is 6 (counting the
terminating null).

Doing the same conversion on cp1253, suppling 5 as the length produces a correct
string, so I assume there's a problem with cp1255 specificly.

Steps to reproduce:
C source is attached as other information.

Actual results:
Output buffer has a missing letter.

Expected results:
Text will be converted correctly.

Does this happen every time?
It does.

Other information:
#include <glib.h>
#include <stdio.h>

void tohex(char *str) {
    char *ch;
    for (ch = str ; *ch ; ch++) {
        printf("%02x ", *ch & 0xff);
    }
    printf("\n");
}

int main() {
    // A five-letter hebrew word (cp1255) - null terminated
    char buf[] = { 0xf4, 0xe5, 0xf8, 0xe5, 0xed, 0x00 };

    // Last char missing
    int buflen = 5;

    // Same thing here
    // int buflen = -1;

    // Works OK...
    // int buflen = 6;

    GIConv giconv = g_iconv_open("UTF8", "CP1255");
    gsize bytes_read, bytes_written;
    GError *err = 0;

    char *outbuf = (char*)g_convert_with_iconv((gchar*)buf,
        buflen, giconv, &bytes_read, &bytes_written, &err);

    printf("buf: ");
    tohex(buf);

    printf("buflen: %d\n", buflen);
    printf("bytes_read: %d\n", bytes_read);

    // outbuf should be 10 bytes long + null (total 11 bytes)
    printf("bytes_written: %d\n", bytes_written);
    printf("outbuf: ");
    tohex(outbuf);

    return 0;
}
Comment 1 Owen Taylor 2005-07-23 16:30:49 UTC
Any problem that's specific to a particular 8-bit encoding is a system
(GNU libc on Linux) problem.
Comment 2 duv 2005-07-23 19:19:44 UTC
I suspect the problem lies in the wrapping of the libc iconv methods and not in
libc itself.

Running the same code above using iconv(3) instead of g_convert_with_iconv will
return correct results.
Comment 3 duv 2005-07-23 19:25:18 UTC
Correcting myself. iconv(3) return invalid unicode, which is probably removed by
g_convert_with_iconv.
Comment 4 Federico Mena Quintero 2005-07-25 15:40:46 UTC
Hmmm, so is bug #138299 a duplicate of this one?
Comment 5 duv 2005-07-25 20:17:23 UTC
It appears so. I've reported this to glibc guys, <a
href="http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124">bug #1124</a>.
Comment 6 duv 2005-08-01 05:45:01 UTC
After certain discussion in libc's bugzilla, it appears that the problem is
cp1255 specific, since it is a stateful encoding, and there's a solution for it
- see comment #4 at http://sources.redhat.com/bugzilla/show_bug.cgi?id=1124#c4
(this time no HTML tags :).

I've tested it using the code in the libc bugreport, and it seems to be working
- which means that g_convert_with_iconv might need a change after all...

Cheers,
Comment 7 Matthias Clasen 2005-08-01 19:51:20 UTC
Created attachment 50096 [details] [review]
a patch
Comment 8 Matthias Clasen 2005-08-01 20:42:26 UTC
Created attachment 50097 [details] [review]
new version

A new version of the patch which reuses the error handling code in
g_convert_with_iconv() and protects against iconv returning something other
than E2BIG in g_convert_with_fallback().
Comment 9 Ilya Konstantinov 2005-08-01 21:06:11 UTC
I don't think the change should be in g_convert_with_iconv(). After all, iconv
*is* a stateful coder so we shouldn't reset its state in every call to iconv but
only when we're sure our entire stream has ended.

It's a possible scenario that a user of g_convert_with_iconv will call it for
every block he receives from the network and he'll want to retain the state
between the calls (which's exactly what the iconv descriptor provides -- a place
to store the state).

Only in a function which does a one-shot at conversion, when its clear to us
when the end of the stream is (and when we afterwards close the iconv
descriptor), we should perform the iconv reset (inbuf = NULL, inbytesleft=NULL).
And that is - only at the end of the g_convert function.
Comment 10 Matthias Clasen 2005-08-02 05:01:30 UTC
Good point.
Comment 11 Matthias Clasen 2005-08-02 06:30:19 UTC
Created attachment 50110 [details] [review]
new version
Comment 12 Matthias Clasen 2005-08-02 06:31:44 UTC
The last patch is a bit larger since it splits g_convert_with_iconv into 
two functions and moves them around a bit.
Comment 13 Ilya Konstantinov 2005-08-02 07:13:14 UTC
Matthias, thanks!
The patch seems solid to me. For the benefit of future generations, I'd also add
a small comment above g_convert_with_iconv_internal explaining its' necessity.

But wait, I think you raised another issue!
g_convert_with_iconv users also need a way to flush the iconv descriptor
properly and the current g_convert_with_iconv doesn't allow them to do it.

So I propose we add a new public API function and also use it ourselves:

static gchar* g_flush_iconv(GIConv converter, gsize *bytes_written, GError **error);
Comment 14 Owen Taylor 2005-08-02 13:10:52 UTC
I think g_convert_with_iconv() was meant to be "g_convert() using a cached
iconv_t" rather than "better interface to iconv()"... may be the latter
is useful though...
Comment 15 Owen Taylor 2005-08-02 13:11:33 UTC
2001-08-24  Alexander Larsson  <alla@lysator.liu.se>

        * glib/gconvert.[ch] (g_convert_with_iconv):
        New function, doing the same as g_convert but taking
        a GIConv argument. The old g_convert is just
        a call to this with a newly opened GIConv.
Comment 16 Matthias Clasen 2005-08-02 13:32:44 UTC
I think a simple fix for the problem raised in comment #13 is to do away
with g_convert_with_iconv_internal and just allow NULL as first argument
to g_convert_with_iconv. That would also considerably shrink the patch.

Owen, do you think g_convert_with_iconv should emit the final shift, then ?
Comment 17 Matthias Clasen 2005-08-02 15:06:36 UTC
Created attachment 50135 [details] [review]
final patch as committed