GNOME Bugzilla – Bug 694843
g_base64_decode_step () produces invalid data
Last modified: 2013-03-05 15:57:11 UTC
Decoding "Cg==" in a single step produces correct data: 0x0a But executing the function twice for "Cg=" and "=" will add an extra \0: 0x0a,0x00 I am attaching also a small program which illustrates the problem.
Created attachment 237582 [details] a C program illustrating the problem
The main problem is that g_base64_decode_step () does not remember if it got padding character in the previous sequence. Submitting a patch which solves the problem.
Created attachment 237596 [details] [review] Remember if we got padding character in the previous sequence
Review of attachment 237596 [details] [review]: Can you add a test case for this to glib/tests/gbase64.c ? ::: glib/gbase64.c @@ +345,3 @@ + + /* check if we got padding character in the previous sequence */ + if (i < 0) { Braces on a new line - GLib is GNU style (just look at the rest of the code).
Created attachment 237620 [details] [review] Remember if we got padding character in the previous sequence added new test to glib/tests/base64.c and fixed indentation
Created attachment 237626 [details] [review] Remember if we got padding character in the previous sequence Updated the new test, the test program is now obsolete
Created attachment 237709 [details] [review] Remember if we got padding character in the previous sequence Yesterday i was running some stab tests where base64 encoded data is exchanged over a network (glib used on both sides) and discovered a problem in my patch. The 'pad' variable which indicated whether we got a padding character was not unset after a group of 4 base64 bytes was successfully converted to 3 bytes. Thus, one byte was always 'lost' from the new group directly following the padding characters. For example, 'Cg==QUJD' was converted to '\nAC' where the correct result should be '\nABC'. This also showed that the proposed unit test was not good enough. I have rewritten my patch and the diff to the original version of the code is much smaller now. No new variables, put bck the ones which were removed, just using the state to remember whether we got padding character in the previous sequence. Also improved the unit test for g_base64_decode_step (). Sorry for my omission when i was submitting the initial version of the patch.
The base64 decoding bits are a bit hairy; this updated patch looks right to me. However your added test case didn't make much sense to me. I've made a new one; does it look better to you as well?
Created attachment 237728 [details] [review] 0001-base64-Add-tests-for-incremental-decoding-with-very-.patch
Created attachment 237729 [details] [review] 0002-base64-Fix-g_base64_decode_step.patch
Note I've revised your patch so that it just uncomments the test cases that would fail, and now pass.
Yes, the test definitely looks better now. If i may suggest a small change to your patch: --- glib/tests/base64.c 2013-03-04 09:51:59.230209350 +0100 +++ glib/tests/base64.c.changed 2013-03-04 09:51:18.201593885 +0100 @@ -337,14 +337,14 @@ { const guint blocksize = GPOINTER_TO_UINT (blocksize_p); guint i; + gint state = 0; + guint save = 0; for (i = 0; ok_100_encode_strs[i]; i++) { const char *str = ok_100_encode_strs[i]; const char *p; gsize len = strlen (str); - gint state = 0; - guint save = 0; guchar *decoded; gsize decoded_size = 0; guchar *decoded_atonce;
(In reply to comment #12) > Yes, the test definitely looks better now. If i may suggest a small change to > your patch: > > --- glib/tests/base64.c 2013-03-04 09:51:59.230209350 +0100 > +++ glib/tests/base64.c.changed 2013-03-04 09:51:18.201593885 +0100 > @@ -337,14 +337,14 @@ > { > const guint blocksize = GPOINTER_TO_UINT (blocksize_p); > guint i; > + gint state = 0; > + guint save = 0; Hmm, I don't see how that change is correct because we need to provide "state" and "save" zero-initialized to *each* call to g_base64_decode.
"state" will be set to zero in g_base64_decode_step () after each 4 base64 byte sequence, so this change won't break anything if the function is correct. Moreover, that way we test also decoding base64 encoded data after 4 bytes long chunk with padding characters. It is fairly Ok to do: g_base64_decode_step ("ab==", &state, &save); g_base64_decode_step ("cdef", &state, &save);
(In reply to comment #14) > It is fairly Ok to do: > > g_base64_decode_step ("ab==", &state, &save); > g_base64_decode_step ("cdef", &state, &save); no, it's not. that's invalid base64, so there are no guarantees on what output bytes the function will produce at that point
ouch, ok then, i thought the intent was to support that case too. Now i see why my test didn't make sense to you. Skip my comment.
Review of attachment 237729 [details] [review]: Makes sense to me.
Ognyan, can you mark the test patch as accepted-commit-now if you agree it makes sense?
Review of attachment 237728 [details] [review]: The patch looks good!
https://git.gnome.org/browse/glib/commit/?id=27b19cee1bb5007f9dd123e171bcf3f978263f15 https://git.gnome.org/browse/glib/commit/?id=06a59f889a8d3c8a63622af64d253632a0530017