After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 694843 - g_base64_decode_step () produces invalid data
g_base64_decode_step () produces invalid data
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2013-02-28 08:53 UTC by Ognyan Tonchev (redstar_)
Modified: 2013-03-05 15:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
a C program illustrating the problem (869 bytes, text/x-csrc)
2013-02-28 08:54 UTC, Ognyan Tonchev (redstar_)
  Details
Remember if we got padding character in the previous sequence (1.67 KB, patch)
2013-02-28 13:29 UTC, Ognyan Tonchev (redstar_)
none Details | Review
Remember if we got padding character in the previous sequence (3.66 KB, patch)
2013-02-28 15:56 UTC, Ognyan Tonchev (redstar_)
none Details | Review
Remember if we got padding character in the previous sequence (3.56 KB, patch)
2013-02-28 16:20 UTC, Ognyan Tonchev (redstar_)
none Details | Review
Remember if we got padding character in the previous sequence (2.73 KB, patch)
2013-03-01 14:07 UTC, Ognyan Tonchev (redstar_)
none Details | Review
0001-base64-Add-tests-for-incremental-decoding-with-very-.patch (2.70 KB, patch)
2013-03-01 18:33 UTC, Colin Walters
accepted-commit_now Details | Review
0002-base64-Fix-g_base64_decode_step.patch (2.08 KB, patch)
2013-03-01 18:33 UTC, Colin Walters
accepted-commit_now Details | Review

Description Ognyan Tonchev (redstar_) 2013-02-28 08:53:54 UTC
Decoding "Cg==" in a single step produces correct data: 0x0a
But executing the function twice for "Cg=" and "=" will add an extra \0: 0x0a,0x00
I am attaching also a small program which illustrates the problem.
Comment 1 Ognyan Tonchev (redstar_) 2013-02-28 08:54:43 UTC
Created attachment 237582 [details]
a C program illustrating the problem
Comment 2 Ognyan Tonchev (redstar_) 2013-02-28 13:13:28 UTC
The main problem is that g_base64_decode_step () does not remember if it got padding character in the previous sequence. Submitting a patch which solves the problem.
Comment 3 Ognyan Tonchev (redstar_) 2013-02-28 13:29:42 UTC
Created attachment 237596 [details] [review]
Remember if we got padding character in the previous sequence
Comment 4 Colin Walters 2013-02-28 13:34:17 UTC
Review of attachment 237596 [details] [review]:

Can you add a test case for this to glib/tests/gbase64.c ?

::: glib/gbase64.c
@@ +345,3 @@
+
+  /* check if we got padding character in the previous sequence */
+  if (i < 0) {

Braces on a new line - GLib is GNU style (just look at the rest of the code).
Comment 5 Ognyan Tonchev (redstar_) 2013-02-28 15:56:53 UTC
Created attachment 237620 [details] [review]
Remember if we got padding character in the previous sequence

added new test to glib/tests/base64.c and fixed indentation
Comment 6 Ognyan Tonchev (redstar_) 2013-02-28 16:20:36 UTC
Created attachment 237626 [details] [review]
Remember if we got padding character in the previous sequence

Updated the new test, the test program is now obsolete
Comment 7 Ognyan Tonchev (redstar_) 2013-03-01 14:07:26 UTC
Created attachment 237709 [details] [review]
Remember if we got padding character in the previous sequence

Yesterday i was running some stab tests where base64 encoded data is exchanged over a network (glib used on both sides) and discovered a problem in my patch.
The 'pad' variable which indicated whether we got a padding character was not unset after a group of 4 base64 bytes was successfully converted to 3 bytes. Thus, one byte was always 'lost' from the new group directly following the padding characters. For example, 'Cg==QUJD' was converted to '\nAC' where the correct result should be '\nABC'. This also showed that the proposed unit test was not good enough.

I have rewritten my patch and the diff to the original version of the code is much smaller now. No new variables, put bck the ones which were removed, just using the state to remember whether we got padding character in the previous sequence. Also improved the unit test for g_base64_decode_step ().

Sorry for my omission when i was submitting the initial version of the patch.
Comment 8 Colin Walters 2013-03-01 18:32:45 UTC
The base64 decoding bits are a bit hairy; this updated patch looks right to me.

However your added test case didn't make much sense to me.  I've made a new one; does it look better to you as well?
Comment 9 Colin Walters 2013-03-01 18:33:06 UTC
Created attachment 237728 [details] [review]
0001-base64-Add-tests-for-incremental-decoding-with-very-.patch
Comment 10 Colin Walters 2013-03-01 18:33:23 UTC
Created attachment 237729 [details] [review]
0002-base64-Fix-g_base64_decode_step.patch
Comment 11 Colin Walters 2013-03-01 18:34:37 UTC
Note I've revised your patch so that it just uncomments the test cases that would fail, and now pass.
Comment 12 Ognyan Tonchev (redstar_) 2013-03-04 08:55:42 UTC
Yes, the test definitely looks better now. If i may suggest a small change to your patch:

--- glib/tests/base64.c	2013-03-04 09:51:59.230209350 +0100
+++ glib/tests/base64.c.changed	2013-03-04 09:51:18.201593885 +0100
@@ -337,14 +337,14 @@
 {
   const guint blocksize = GPOINTER_TO_UINT (blocksize_p);
   guint i;
+  gint state = 0;
+  guint save = 0;
 
   for (i = 0; ok_100_encode_strs[i]; i++)
     {
       const char *str = ok_100_encode_strs[i];
       const char *p;
       gsize len = strlen (str);
-      gint state = 0;
-      guint save = 0;
       guchar *decoded;
       gsize decoded_size = 0;
       guchar *decoded_atonce;
Comment 13 Colin Walters 2013-03-04 15:47:30 UTC
(In reply to comment #12)
> Yes, the test definitely looks better now. If i may suggest a small change to
> your patch:
> 
> --- glib/tests/base64.c    2013-03-04 09:51:59.230209350 +0100
> +++ glib/tests/base64.c.changed    2013-03-04 09:51:18.201593885 +0100
> @@ -337,14 +337,14 @@
>  {
>    const guint blocksize = GPOINTER_TO_UINT (blocksize_p);
>    guint i;
> +  gint state = 0;
> +  guint save = 0;

Hmm, I don't see how that change is correct because we need to provide "state" and "save" zero-initialized to *each* call to g_base64_decode.
Comment 14 Ognyan Tonchev (redstar_) 2013-03-04 16:13:06 UTC
"state" will be set to zero in g_base64_decode_step () after each 4 base64 byte sequence, so this change won't break anything if the function is correct. Moreover, that way we test also decoding base64 encoded data after 4 bytes long chunk with padding characters. It  is fairly Ok to do:

g_base64_decode_step ("ab==", &state, &save);
g_base64_decode_step ("cdef", &state, &save);
Comment 15 Dan Winship 2013-03-04 16:25:08 UTC
(In reply to comment #14)
> It  is fairly Ok to do:
> 
> g_base64_decode_step ("ab==", &state, &save);
> g_base64_decode_step ("cdef", &state, &save);

no, it's not. that's invalid base64, so there are no guarantees on what output bytes the function will produce at that point
Comment 16 Ognyan Tonchev (redstar_) 2013-03-05 06:43:53 UTC
ouch, ok then, i thought the intent was to support that case too. Now i see why my test didn't make sense to you. Skip my comment.
Comment 17 Colin Walters 2013-03-05 15:14:36 UTC
Review of attachment 237729 [details] [review]:

Makes sense to me.
Comment 18 Colin Walters 2013-03-05 15:15:29 UTC
Ognyan, can you mark the test patch as accepted-commit-now if you agree it makes sense?
Comment 19 Ognyan Tonchev (redstar_) 2013-03-05 15:28:41 UTC
Review of attachment 237728 [details] [review]:

The patch looks good!