After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 780066 - g_base64_encode_close() in glib/gbase64.c produces invalid base64 encoding
g_base64_encode_close() in glib/gbase64.c produces invalid base64 encoding
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
2.51.x
Other All
: Normal major
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2017-03-14 21:54 UTC by Rainer Perske
Modified: 2017-03-23 15:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
gbase64: Fix base-64 stepped encoding with small chunks (4.64 KB, patch)
2017-03-17 11:37 UTC, Philip Withnall
none Details | Review
gbase64: Fix base-64 stepped encoding with small chunks (4.89 KB, patch)
2017-03-17 11:53 UTC, Philip Withnall
committed Details | Review

Description Rainer Perske 2017-03-14 21:54:06 UTC
RFC 2045 states in section "6.8.  Base64 Content-Transfer-Encoding":

"Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded.  A full encoding quantum is
always completed at the end of a body.  When fewer than 24 input bits
are available in an input group, zero bits are added (on the right)
to form an integral number of 6-bit groups. [...]"

g_base64_encode_close() violates these rules by padding not with zero bits but bits from an uninitialized storage location!

This happens if the last encoding quantum has to encode not 3 and not 2 but only 1 byte.

Then only c1 has a definite value. Nevertheless, possible nonzero bits from c2 are used to build the final encoding quantum. (This happens in the last line of the fragment cited below.) So the e.g. a final NUL byte in a file of length 3N+1 can be encoded as any of AA==, AB== etc. to AP==. The only correct encoding is AA==.

I consider this a major bug because this routine is used by Evolution Mail to base64 encode attachments and there are e-mail programs in the wild that reject attachment created by Evolution Mail with such invalid base64 encoding. This caused important documents not to reach the intended recipient in time.

If I understand the code correctly, there is a simple bugfix:

In glib/gbase64.c replace

[...]
    case 2:
      outptr [2] = base64_alphabet[ ( (c2 &0x0f) << 2 ) ];
      g_assert (outptr [2] != 0);
      goto skip;
    case 1:  
      outptr[2] = '=';
    skip:
      outptr [0] = base64_alphabet [ c1 >> 2 ];
      outptr [1] = base64_alphabet [ c2 >> 4 | ( (c1&0x3) << 4 )];
[...]

with:

[...]
    case 2:
      outptr [2] = base64_alphabet[ ( (c2 &0x0f) << 2 ) ];
      g_assert (outptr [2] != 0);
      goto skip;
    case 1:  
      outptr[2] = '=';
      c2 = 0;
    skip:
      outptr [0] = base64_alphabet [ c1 >> 2 ];
      outptr [1] = base64_alphabet [ c2 >> 4 | ( (c1&0x3) << 4 )];
[...]

(The line c2 = 0; is added.)

Thank you in advance for reading and fixing this.
Comment 1 Philip Withnall 2017-03-15 11:50:24 UTC
(In reply to Rainer Perske from comment #0)
> RFC 2045 states in section "6.8.  Base64 Content-Transfer-Encoding":
> 
> "Special processing is performed if fewer than 24 bits are available
> at the end of the data being encoded.  A full encoding quantum is
> always completed at the end of a body.  When fewer than 24 input bits
> are available in an input group, zero bits are added (on the right)
> to form an integral number of 6-bit groups. [...]"
> 
> g_base64_encode_close() violates these rules by padding not with zero bits
> but bits from an uninitialized storage location!

Which uninitialised storage location would that be? From a quick read of the code in g_base64_encode_close(), `c2` is initialised from the third byte of `save`, which is provided by the caller.

If the caller is using g_base64_encode(), `save` is correctly initialised to 0.

If the caller is using g_base64_encode_step() manually, they must initialise the `state` and `save` themselves to 0.

How else could uninitialised state creep in there?
Comment 2 Rainer Perske 2017-03-15 11:59:40 UTC
In the situation described, exactly that 3rd byte has no useful value.
The last byte of the file had been stored in the 2nd byte of save and there was no byte left to be stored in the 3rd byte of save. Hence this storage location is either uninitialized (when len(input data)==1) or still contains an old value (when len(input data)==3*N+1 with N>0).

But that all is irrelevant: The 3rd byte of save simply must not be used in this situation. Only zero bits may be used instead.
Comment 3 Rainer Perske 2017-03-15 12:15:31 UTC
Or explained in another way:

const guchar *data={0xff, 0xff, 0xff, 0x00};
return g_base64_encode(data, 4);

This should return "////AA==".

But it will return "////AP==".

Because the line
((char *)save)[0] = 0;
in g_base64_encode_step does NOT reset save[2]. So save[2] contains 0xff and not 0x00 when g_base64_encode_close is called.

You say: If the caller is using g_base64_encode(), `save` is correctly initialised to 0.

I say: Yes. But when g_base64_encode_close is called, it does no longer have this value.

Perhaps my word "uninitialized" is wrong. Please read it as "random" or "invalid" or "unexpected" (Sorry if my English knowledge is not precise enough. There is absolutely no offense intended.)
Comment 4 Rainer Perske 2017-03-15 12:52:26 UTC
I have taken the code and must make a small correction:
  const guchar *data={0xff, 0xff, 0xff, 0x00};
  return g_base64_encode(data, 4);
returns "////AA==".
But passing it in chunks:
  char buf[99]="";
  pos=0;
  len=g_base64_encode_step ("\xff\xff",2,false,buf+pos,&st,&save);  pos+=len;
  len=g_base64_encode_step ("\xff\x00",2,false,buf+pos,&st,&save);  pos+=len;
  len=g_base64_encode_close(false,buf+pos,&st,&save);
  puts(buf);
returns "////AP==".
Comment 5 Rainer Perske 2017-03-15 12:53:53 UTC
Of course st and save were initialized to zero.
Comment 6 Philip Withnall 2017-03-17 11:37:37 UTC
Created attachment 348167 [details] [review]
gbase64: Fix base-64 stepped encoding with small chunks

If encoding with small chunks such that the call to
g_base64_encode_close() has to deal with 0 < x < 24 bits of remaining
state, the encoding code would not correctly use zeroes to pad out the
state — it would use left-over state from an earlier iteration in the
encoding process.

This resulted in invalid base-64 encodings.

Add a unit test for incremental encoding using different sized chunks
too.

Thanks to Rainier Perske for reporting and analysing the bug.

https://bugzilla.gnome.org/show_bug.cgi?id=780066

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 7 Philip Withnall 2017-03-17 11:53:31 UTC
Created attachment 348169 [details] [review]
gbase64: Fix base-64 stepped encoding with small chunks

If encoding with small chunks such that the call to
g_base64_encode_close() has to deal with 0 < x < 24 bits of remaining
state, the encoding code would not correctly use zeroes to pad out the
state — it would use left-over state from an earlier iteration in the
encoding process.

This resulted in invalid base-64 encodings.

Add a unit test for incremental encoding using different sized chunks
too.

Thanks to Rainier Perske for reporting and analysing the bug.

https://bugzilla.gnome.org/show_bug.cgi?id=780066

Signed-off-by: Philip Withnall <withnall@endlessm.com>
Comment 8 Allison Karlitskaya (desrt) 2017-03-23 11:55:37 UTC
Review of attachment 348169 [details] [review]:

This is terrible.
Comment 9 Philip Withnall 2017-03-23 15:34:59 UTC
Attachment 348169 [details] pushed as 35c0dd2 - gbase64: Fix base-64 stepped encoding with small chunks