After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 443648 - MD5 digest support
MD5 digest support
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 314423
 
 
Reported: 2007-06-03 17:55 UTC by Emmanuele Bassi (:ebassi)
Modified: 2007-12-04 16:37 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
[PATCH] Add MD5 digest support in GLib (20.89 KB, patch)
2007-06-03 18:01 UTC, Emmanuele Bassi (:ebassi)
needs-work Details | Review
[PATCH] Add MD5 API to the gtk-doc build (614 bytes, patch)
2007-06-03 18:01 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
GChecksum implementation (53.27 KB, patch)
2007-08-29 13:42 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
[PATCH] Add a convience function for GChecksum (2.47 KB, patch)
2007-08-29 15:22 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
[PATCH] Add a convience function for GChecksum (4.61 KB, patch)
2007-08-29 15:34 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
GChecksum implementation (reviewed) (53.02 KB, patch)
2007-08-29 17:16 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
GChecksum convenience wrappers (55.28 KB, patch)
2007-08-29 17:18 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
[PATCH] GChecksum implementation (53.78 KB, patch)
2007-08-30 15:49 UTC, Emmanuele Bassi (:ebassi)
none Details | Review
[PATCH] GChecksum implementation (54.97 KB, patch)
2007-10-04 08:52 UTC, Emmanuele Bassi (:ebassi)
committed Details | Review
[PATCH] Use GChecksum in the local implementation of GFileInfo (12.54 KB, patch)
2007-12-03 10:41 UTC, Emmanuele Bassi (:ebassi)
committed Details | Review

Description Emmanuele Bassi (:ebassi) 2007-06-03 17:55:20 UTC
at the moment there are multiple copies of the MD5 digest computing functions in GNOME libraries: gnome-vfs (via libneon), libsoup, libgnomeui; gvfs will probably need MD5 hashing too. if GTK+ will implement the thumbnailing specification it'll too need support for computing MD5 digests. hence, I propose that GLib adds API to compute MD5 digests of arbitrary chunks of data, both in full pass or via streaming using a "state" object.
Comment 1 Emmanuele Bassi (:ebassi) 2007-06-03 18:01:04 UTC
Created attachment 89282 [details] [review]
[PATCH] Add MD5 digest support in GLib


MD5 digests are used in various libraries and programs (libsoup, gnome-vfs,
libgnomeui); these libraries usually cut and paste the MD5 digest code in the
public domain. GLib should provide a common API to obtain a MD5 digest or hash
string from a buffer to remove code duplication.

This patch adds an API for computing MD5 digests from arbitrary chunks
of binary data. The new API allows to compute a MD5 digest from a full
sized buffer or by feeding chunks to a small context object which should
typically be placed on the stack. Two new functions for using MD5 digests
inside GHashTable and GCache have been also provided. The API has been
designed to offer the least difficult migration path for the current MD5
digest users without losing generality.

The API is documented and comes with a test case for the test suite.
---
 glib/Makefile.am  |    2 +
 glib/glib.h       |    1 +
 glib/glib.symbols |   14 ++
 glib/gmd5.c       |  497 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 glib/gmd5.h       |   66 +++++++
 tests/Makefile.am |    2 +
 tests/md5-test.c  |  113 ++++++++++++
 7 files changed, 695 insertions(+), 0 deletions(-)
Comment 2 Emmanuele Bassi (:ebassi) 2007-06-03 18:01:08 UTC
Created attachment 89283 [details] [review]
[PATCH] Add MD5 API to the gtk-doc build

 docs/reference/glib/glib-sections.txt |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)
Comment 3 Behdad Esfahbod 2007-06-03 19:18:48 UTC
I don't like the context-on-stack part.  Is the slight performance gain worth making an internal struct public?  g_slice_new is probably of ignorable overhead compared to MD5 computation, isn't it?
Comment 4 Behdad Esfahbod 2007-06-03 19:19:31 UTC
Oh, other than that, Good job!  Can't believe we don't have md5 in glib right now.
Comment 5 Dan Winship 2007-06-03 21:28:30 UTC
I think you should put warnings in the docs about the fact that MD5 is
basically broken at this point. Something like "This function is provided for
use with existing network protocols and data formats that use MD5 hashes. MD5
is no longer considered cryptographically secure, and should not be used in
new protocols and data formats."

Also, the comment near the top talks about "ThumbMD5Context" structures.
And the phrase "in ASCII form" should be removed from the docs for
g_md5_context_get_digest and g_md5_get_digest, which return raw binary
data, not ASCII.


Comment 6 Emmanuele Bassi (:ebassi) 2007-06-03 22:37:36 UTC
(In reply to comment #3)
> I don't like the context-on-stack part.

yeah, I don't like that either; it also exposes a lot of the context internal data. but since it's mostly arrays of bytes allocating/deallocating I thought it might yield some performance hit. it's also true that the slice allocator should solve that. in my initial coding the context was allocated on the heap (and I had a new/copy/free set of functions); I can rework the patch like that.

(In reply to comment #5)
> I think you should put warnings in the docs about the fact that MD5 is
> basically broken at this point.

it's broken from a crypto-analytic point of view; MD5 hashes are still useful, since the chances of collisions are still low, and for the typical usage it's really not a problem. but I agree: the documentation should make clear that MD5 digests should not be used in cryptographic contexts.

> Also, the comment near the top talks about "ThumbMD5Context" structures.

it was part of the original comment that I retained.

> And the phrase "in ASCII form" should be removed from the docs for
> g_md5_context_get_digest and g_md5_get_digest, which return raw binary
> data, not ASCII.

yep, my bad.
Comment 7 Behdad Esfahbod 2007-06-04 07:06:12 UTC
Humm, no version that returns hex?  I recently used Python's md5 module and the hex digest method was really helpful.
Comment 8 Behdad Esfahbod 2007-06-04 07:08:53 UTC
Also, how do you see g_md5_digest_equal() useful with GHashTable?  I fail to see it.  Normally you have some key and want to use md5 as its hash, right?  g_md5_digest_equal() can only be useful if using md5 hashes as key.

Also, I'd personally rather just use memcmp() and let the compiler deal with the optimization.
Comment 9 Havoc Pennington 2007-06-04 18:17:26 UTC
Why not do SHA also so there's something to use with newly-designed stuff, rather than only the legacy md5 stuff.

In the docs for get_string() I'd mention that it returns a hex string.

As others have suggested it would be more glib-like to have new/free instead of init/close

Comment 10 Dave Benson 2007-06-05 01:31:49 UTC
I also have an implementation of this, which makes a hashing object that you feed data too.  It supports MD5, SHA1, SHA256 and CRC32.

See:
http://gsk.svn.sourceforge.net/viewvc/gsk/trunk/src/hash/gskhash.h?view=markup
http://gsk.svn.sourceforge.net/viewvc/gsk/trunk/src/hash/gskhash.c?view=markup

I can make a simple patch that switches the prefix if that's desired.
Comment 11 Dave Benson 2007-06-05 01:39:45 UTC
We should probably name it something more like 'GChecksum' since 'GHash' would obviously be confusing.
Comment 12 Mathias Hasselmann (IRC: tbf) 2007-06-15 18:28:43 UTC
Also want to see this in glib.
Comment 13 Emmanuele Bassi (:ebassi) 2007-08-29 13:42:46 UTC
Created attachment 94547 [details] [review]
GChecksum implementation

GChecksum is the reimplementation of the MD5 checksum API following the previous comments. it's generic, implementing the MD5, SHA-1 and SHA-256 hashing algorithms; it's opaque, so it can be extended at any later point; it's dynamically allocated; it's simpler, dropping most of the API in favour of a new-update-get cycle.

this patch provides the GChecksum API, implementation and a test suite; the API is also documented.

---
 docs/reference/glib/glib-sections.txt |   13 +
 glib/Makefile.am                      |    2 +
 glib/gchecksum.c                      | 1349 +++++++++++++++++++++++++++++++++
 glib/gchecksum.h                      |   49 ++
 glib/glib.h                           |    1 +
 glib/glib.symbols                     |   12 +
 tests/Makefile.am                     |    2 +
 tests/checksum-test.c                 |  374 +++++++++
 8 files changed, 1802 insertions(+), 0 deletions(-)
Comment 14 Behdad Esfahbod 2007-08-29 15:01:12 UTC
Coolo.  How do people feel about having a convenience function taking a GChecksumType and a string, returning a newly allocated checksum string?
Comment 15 Emmanuele Bassi (:ebassi) 2007-08-29 15:22:09 UTC
Created attachment 94550 [details] [review]
[PATCH] Add a convience function for GChecksum


This patch adds g_checksum_from_data(), a simple convenience function
wrapping a GChecksum. This function takes a binary blob and returns a
string containing the digest in hexadecimal form, taking care of
creating and destroying the GChecksum data structure used to compute
it.
---
 docs/reference/glib/glib-sections.txt |    1 +
 glib/gchecksum.c                      |   38 +++++++++++++++++++++++++++++++++
 glib/gchecksum.h                      |    4 +++
 glib/glib.symbols                     |    1 +
 4 files changed, 44 insertions(+), 0 deletions(-)
Comment 16 Emmanuele Bassi (:ebassi) 2007-08-29 15:25:16 UTC
g_checksum_from_data() takes a binary data blob; g_checksum_from_string() could be added as well:

  gchar *
  g_checksum_from_string (GChecksumType  type,
                          const gchar   *str)
  {
    return g_checksum_from_data (type, (const guchar *) str, strlen (str));
  }

Comment 17 Emmanuele Bassi (:ebassi) 2007-08-29 15:34:52 UTC
Created attachment 94551 [details] [review]
[PATCH] Add a convience function for GChecksum


This patch adds g_checksum_from_string(), a simple convenience function
wrapping a GChecksum. This function takes a NUL-terminated string and
returns a string containing the digest in hexadecimal form, taking care of
creating and destroying the GChecksum data structure used to compute it.
---
 docs/reference/glib/glib-sections.txt |    1 +
 glib/gchecksum.c                      |   29 ++++++++++++++++++++++++++---
 glib/gchecksum.h                      |   30 ++++++++++++++++--------------
 glib/glib.symbols                     |    1 +
 4 files changed, 44 insertions(+), 17 deletions(-)
Comment 18 Havoc Pennington 2007-08-29 15:42:02 UTC
The usual GLib pattern would be that if you pass -1 for the size of the g_checksum_from_data() data, then the data is assumed nul-terminated.

g_checksum_from_data() name implies to me a GChecksum* is returned... maybe something like g_checksum_compute() ?

Other API comments:
- g_checksum_type() should be g_checksum_get_type(), but that creates a problem with the function to return GType... clearer might be to call it GChecksumAlgorithm and have it be get_algorithm() (or GChecksumHash? might be confused with both hash tables and the crypto hash concept, so Algorithm seems nicer, other than it's kind of long)
- get_digest() should have the len as an out param, seems more GLib-typical to me

Looks nice, I hope this gets in soon.
Comment 19 Emmanuele Bassi (:ebassi) 2007-08-29 16:08:25 UTC
(In reply to comment #18)
> The usual GLib pattern would be that if you pass -1 for the size of the
> g_checksum_from_data() data, then the data is assumed nul-terminated.

from_data() takes a binary blob, so we cannot assume NUL-terminated strings.

from_string() would follow this convention, though adding a "length" parameter would be overkill, IMHO.
 
> g_checksum_from_data() name implies to me a GChecksum* is returned... maybe
> something like g_checksum_compute() ?

yeah, I thought about that as well. maybe g_compute_checksum_for_data() and g_compute_checksum_for_string().

> Other API comments:
> - g_checksum_type() should be g_checksum_get_type(), but that creates a problem
> with the function to return GType... clearer might be to call it
> GChecksumAlgorithm and have it be get_algorithm() (or GChecksumHash? might be
> confused with both hash tables and the crypto hash concept, so Algorithm seems
> nicer, other than it's kind of long)

it's more an accessor like "checksum->type", since you cannot change the type of a GChecksum after you created it. I'm not even strongly attached to the function itself - it could be removed altogether.

> - get_digest() should have the len as an out param, seems more GLib-typical to
> me

if you don't care about the length of the digest (because you already know it, for instance) then you'd still have to write:

  digest = NULL;
  g_checksum_get_digest (checksum, &digest, NULL))
  if (!digest)
    {
      g_warning ("Unable to get the digest");
    }

instead of:

  digest = NULL;
  if (!g_checksum_get_digest (checksum, &digest))
    {
      g_warning ("Unable to get the digest");
    }

but, also in this case, I have no strong feelings about this function signature.
Comment 20 Behdad Esfahbod 2007-08-29 16:14:55 UTC
(In reply to comment #18)
> The usual GLib pattern would be that if you pass -1 for the size of the
> g_checksum_from_data() data, then the data is assumed nul-terminated.

This pattern is a can of worm already, but the most common case is that even if you provide a positive length, prcessing stops at the first NUL byte.  That is, a positive length can be used to truncate the string.  Not sure how useful that is btw...  Better to keep string and binary versions separate for new API.
Comment 21 Havoc Pennington 2007-08-29 16:29:48 UTC
> from_data() takes a binary blob, so we cannot assume NUL-terminated strings.

That's the whole point, that's the GLib convention - if you pass -1 for the length, it means the string is nul-terminated. See g_string_*_len(), g_strstr_len(), etc. I don't think there are other GLib APIs where "data" means not nul terminated and "string" means nul terminated. GString is not (necessarily) nul-terminated and "data" is mostly used to mean "a void*"

Part of the point of the -1 convention is to avoid having two functions, instead there's just the one and you pass -1 if you know you have a nul.

On get_digest(), the out param for length is just a matter of consistent convention. See  g_base64_decode(), g_key_file_get_keys(), g_file_get_contents(), g_bookmark_file_to_data(). Though only g_file_get_contents() works as I was thinking with two out params, all the others return the string and have an optional out param for the length, so that would be fine too I would think.

It's not that the "GLib way" is provably better in any absolute sense on either of these points, it's just that it's better to keep things consistent when making arbitrary decisions.

I don't understand the sample code where you have g_warning("Unable to get digest") - the function can't fail, can it? (Other than a bug in the app, but there's already a g_return_if_fail warning for that.)

Comment 22 Havoc Pennington 2007-08-29 16:35:48 UTC
> Better to keep string and binary versions separate for new API.

I really disagree, unless you're going to go back and fix all the old APIs and do a deprecation-fest, the inconsistency (cure) is worse than whatever could be wrong with the old APIs (disease? I don't really see the problem with the -1 thing. A lot of GTK uses it too, not just GLib.)

Comment 23 Havoc Pennington 2007-08-29 16:38:12 UTC
But if you do different versions for string and binary, I would name them like this:

 char* g_checksum_compute     (GChecksumAlgorithm algo, const char *str);
 char* g_checksum_compute_len (GChecksumAlgorithm algo, const char *str, gssize len);

Which does have existing GLib and GTK analogs. The non-len version just calls the _len one with -1, though.

Comment 24 Emmanuele Bassi (:ebassi) 2007-08-29 16:53:15 UTC
(In reply to comment #21)
> > from_data() takes a binary blob, so we cannot assume NUL-terminated strings.
> 
> That's the whole point, that's the GLib convention - if you pass -1 for the
> length, it means the string is nul-terminated. See g_string_*_len(),
> g_strstr_len(), etc. I don't think there are other GLib APIs where "data" means
> not nul terminated and "string" means nul terminated. GString is not
> (necessarily) nul-terminated and "data" is mostly used to mean "a void*"

g_checksum_* take unsigned chars as binary blobs, exactly like the g_base64_* API. for const guchar* GLib and GTK+ always assume that the length of the passed buffer is > 1, and length is a required parameter.

for a string API, such as a convenience function like g_compute_checksum_from_string(), then I agree: having an optional length parameter, which gets automatically set to strlen(str) if it's <0 is the convention and it could be used.

what I want to note is that GChecksum assumes that the passed data are binary blobs in form of unsigned chars, which, by convention in GLib, GTK+ and other G* places, are always bound to a length parameter greater than 1. hence, g_compute_checksum_from_data(), which takes a const guchar* should have a non-optional length parameter which cannot be set to -1 without the function failing; while g_compute_checksum_from_string(), which would take a const gchar*, should have an optional length parameter which, when set to a value of <1, becomes the length of the passed string.

> Part of the point of the -1 convention is to avoid having two functions,
> instead there's just the one and you pass -1 if you know you have a nul.

this is true iff the data passed is a signed char* (gchar*); it's not true if we use unsigne char* (guchar*). see gbase64.h in GLib and the selection API in GTK+.
 
> On get_digest(), the out param for length is just a matter of consistent
> convention. See  g_base64_decode(), g_key_file_get_keys(),

okay, as I said I have no strong feelings about the signature of that function.

  void
  g_checksum_get_digest (GChecksum *checksum,
                         guint8    *digest,
                         gsize     *digest_len)

is perfectly fine for me.
Comment 25 Behdad Esfahbod 2007-08-29 17:01:55 UTC
(In reply to comment #21)

> On get_digest(), the out param for length is just a matter of consistent
> convention. See  g_base64_decode(), g_key_file_get_keys(),
> g_file_get_contents(), g_bookmark_file_to_data(). Though only
> g_file_get_contents() works as I was thinking with two out params, all the
> others return the string and have an optional out param for the length, so that
> would be fine too I would think.

So, why not return the more useful digest as return value and option length as output parameter?
Comment 26 Behdad Esfahbod 2007-08-29 17:08:45 UTC
(In reply to comment #22)
> > Better to keep string and binary versions separate for new API.
> 
> I really disagree, unless you're going to go back and fix all the old APIs and
> do a deprecation-fest, the inconsistency (cure) is worse than whatever could be
> wrong with the old APIs (disease? I don't really see the problem with the -1
> thing. A lot of GTK uses it too, not just GLib.)

I'm just trying to avoid more confusion.  Take this for example:

g_strrstr_len ()

gchar*              g_strrstr_len                       (const gchar *haystack,
                                                         gssize haystack_len,
                                                         const gchar *needle);

Searches the string haystack for the last occurrence of the string needle, limiting the length of the search to haystack_len.

haystack : 	a nul-terminated string.
haystack_len : 	the maximum length of haystack.
needle : 	the nul-terminated string to search for.
Returns : 	a pointer to the found occurrence, or NULL if not found.

haystack is still a nul-terminated string even with a len argument set, and if len > strlen(str), len is ignored.  It's the glib convention.  Most functions work like this.  But there are some that don't.  For example, g_utf8_validate(), if the string ends (via a nul-byte) before len bytes, it returns an error.  There's typically no way to know how each function works until you read the code.  People have built a mental model around (const char *str, int len).  Taking binary data that way is different from that model.

In another note, pango_layout_set_text() does not allow nul-characters in for the same reasons.  I need to add new API to allow text with nul-characters.
Comment 27 Emmanuele Bassi (:ebassi) 2007-08-29 17:16:55 UTC
Created attachment 94558 [details] [review]
GChecksum implementation (reviewed)

like attachment 94547 [details] [review] - but without g_checksum_type() and with the length of the digest as an out parameter on g_checksum_get_digest().
Comment 28 Emmanuele Bassi (:ebassi) 2007-08-29 17:18:36 UTC
Created attachment 94559 [details] [review]
GChecksum convenience wrappers

newly added functions:

  g_compute_checksum_for_data() - for binary data, with a known length
  g_compute_checksum_for_string() - for strings, with an optional length
Comment 29 Emmanuele Bassi (:ebassi) 2007-08-30 15:49:42 UTC
Created attachment 94651 [details] [review]
[PATCH] GChecksum implementation

Full implementation of GChecksum API and convenience wrappers, plus documentation and test suite, against current SVN trunk. This should be the patch to be considered for inclusion, pending a review from GLib maintainers.

---
 docs/reference/glib/glib-docs.sgml     |    2 +
 docs/reference/glib/glib-sections.txt  |   16 +
 docs/reference/glib/tmpl/checksum.sgml |  128 +++
 glib/Makefile.am                       |    2 +
 glib/gchecksum.c                       | 1396 ++++++++++++++++++++++++++++++++
 glib/gchecksum.h                       |   67 ++
 glib/glib.h                            |    1 +
 glib/glib.symbols                      |   13 +
 tests/Makefile.am                      |    2 +
 tests/checksum-test.c                  |  170 ++++
 10 files changed, 1797 insertions(+), 0 deletions(-)
Comment 30 Mathias Hasselmann (IRC: tbf) 2007-09-01 10:36:15 UTC
Seems there is an ISO standard for hash functions: http://www.incits.org/ref-docs/FDIS_10118-3.pdf
Comment 31 Emmanuele Bassi (:ebassi) 2007-10-04 08:52:10 UTC
Created attachment 96616 [details] [review]
[PATCH] GChecksum implementation

Full implementation of GChecksum API and convenience wrappers, plus
documentation and test suite, against current SVN trunk. This should be the
patch to be considered for inclusion, pending a review from GLib maintainers.

This patch contains a change in behaviour from attachment 94651 [details] [review] for g_checksum_copy(): copying a closed checksum will not reopen it. Rationale: when we close the checksums we also destroy the temporary state buffers to avoid security breaches; once the state is lost you cannot safely update the checksum.

This patch also contains ChangeLog updates.
Comment 32 Morten Welinder 2007-10-04 14:06:28 UTC
In attachment 96616 [details] [review], checksum.c calls itself checksum.h in the initial
comment.

Has the code been tested on cpus that trap on non-aligned access?
(Sparc, alpha, i386 if you manage to test the AC flag)
Comment 33 Emmanuele Bassi (:ebassi) 2007-10-04 14:11:31 UTC
(In reply to comment #32)
> In attachment 96616 [details] [review] [edit], checksum.c calls itself checksum.h in the initial
> comment.

yeah, typo from cut-and-paste.

> Has the code been tested on cpus that trap on non-aligned access?
> (Sparc, alpha, i386 if you manage to test the AC flag)

nope, but the actual checksum implementations are present in many projects, most of which are declared as portable; the sha-1 algorithm is used by d-bus, while the md5 is used in libneon and other libraries. the sha-256 implementation is the one relatively without a known "pedigree" but it's similar enough to the sha-1 implementation that I guess both would break in the same ways.

testing would nevertheless be awesome.
Comment 34 Koen Kooi 2007-10-04 16:19:59 UTC
(In reply to comment #33)
> (In reply to comment #32)

> > Has the code been tested on cpus that trap on non-aligned access?
> > (Sparc, alpha, i386 if you manage to test the AC flag)
> 
> nope, but the actual checksum implementations are present in many projects,
> most of which are declared as portable; the sha-1 algorithm is used by d-bus,
> while the md5 is used in libneon and other libraries. the sha-256
> implementation is the one relatively without a known "pedigree" but it's
> similar enough to the sha-1 implementation that I guess both would break in the
> same ways.
> 
> testing would nevertheless be awesome.

ARM cpus need aligned access as well, and I know you have those in your office :)

Comment 35 Richard Purdie 2007-10-05 09:36:34 UTC
I can confirm the checksum-test passes on ARM with this patch (ARM doesn't support unaligned accesses).
Comment 36 Rob Bradford 2007-10-05 13:37:40 UTC
Works on the ARM11 I have here.
Comment 37 Morten Welinder 2007-10-05 14:00:59 UTC
Could you ARM guys please try replacing

g_checksum_update (checksum0, (const guchar *) FIXED_STR, FIXED_LEN);

by

g_checksum_update (checksum0, 1+(const guchar *) FIXED_STR, FIXED_LEN-1);

in the test program and retry.  The code should not crash after that
(but clearly it should complain that the checksums don't match).
Comment 38 Richard Purdie 2007-10-05 14:20:38 UTC
With that change I see "Invalid MD5 checksum for `The quick brown fox jumps over the lazy dog': 3e0b2376d49ba1e0b39761f1fc01d193 (expecting: 9e107d9d372bb6826bd81d3542a419d6)"
Comment 39 Koen Kooi 2007-10-06 09:31:41 UTC
(In reply to comment #36)
> Works on the ARM11 I have here.

ARM11 does support unaligned access in hardware, so that's not a good test
Comment 40 Mathias Hasselmann (IRC: tbf) 2007-10-28 21:50:29 UTC
No serious objections for a while. So I assume it is ok to commit this?
Comment 41 Emmanuele Bassi (:ebassi) 2007-10-28 22:57:31 UTC
I'd like to have a comment from mclasen or rambokid. and has GLib been branched?
Comment 42 Emmanuele Bassi (:ebassi) 2007-12-03 10:41:49 UTC
Created attachment 100104 [details] [review]
[PATCH] Use GChecksum in the local implementation of GFileInfo

Remove the copy-and-paste MD5 hash implementation and use GChecksum instead.

This is an incremental patch that goes on top of attachment 96616 [details] [review].

Now that GIO went in GLib, I'd like a review of the API/code so we can start cutting down the amount of duplicated code in GNOME.
Comment 43 Matthias Clasen 2007-12-03 16:50:30 UTC
Glib has been branched a while ago.

The one "objection" I have to this is that the Red Hat security team are pushing towards consolidating all crypto code to use NSS (since NSS has fips certifications, etc). Adding crypto code in GLib could be seen as a counterproductive step wrt to the consolidation goal. This objection does not apply to the MD5 code, since MD5 clearly is not relevant for serious crypto uses
at this point anymore. It does still apply for SHA, though. 

Consolidating all hash implementations that can be found throughout the desktop stack in glib might still be a win though. I'll ask our security guys to comment
here.
Comment 44 Tomas Mraz 2007-12-03 19:39:01 UTC
Adding a general hash API to glib is good idea. This way the various applications which use glib can use hashes as well without looking for more powerful crypto libraries.

The API seems to be OK to me and the implementation can always be done so the internal implementation of the algorithms is replaced with calls to NSS based on a configure-time option. I'd probably add a g_checksum_get_length(type) call to get length of the binary hash value for given type.
Comment 45 Behdad Esfahbod 2007-12-03 20:32:23 UTC
We can always make glib use NSS.  That's a huge win over having to make individual applications/libraries use NSS directly.
Comment 46 Emmanuele Bassi (:ebassi) 2007-12-03 21:19:03 UTC
(In reply to comment #44)
> I'd probably add a g_checksum_get_length(type) call
> to get length of the binary hash value for given type.

g_checksum_get_digest() already accepts a pointer with the length of the digest.

Comment 47 Emmanuele Bassi (:ebassi) 2007-12-03 21:25:43 UTC
(In reply to comment #43)
> Glib has been branched a while ago.

yep. :-)

> The one "objection" I have to this is that the Red Hat security team are
> pushing towards consolidating all crypto code to use NSS (since NSS has fips
> certifications, etc). Adding crypto code in GLib could be seen as a
> counterproductive step wrt to the consolidation goal. This objection does not
> apply to the MD5 code, since MD5 clearly is not relevant for serious crypto
> uses
> at this point anymore. It does still apply for SHA, though. 

I guess GChecksum should be considered more a utility API for checking short bursts of data, more than a complete crypto subsection of GLib; and while MD5 is still mandated in some specs (the thumbnailing spec, for instance), SHA-1 is getting more used just for unique name generation and validation (git and other VCS, d-bus, etc.) because of the longer string and very little probability of collision with lots of names.

> Consolidating all hash implementations that can be found throughout the desktop
> stack in glib might still be a win though. I'll ask our security guys to
> comment
> here.

we could optionally depend on libnss at compile time, and switch over to the internal implementations if no installed libnss was found.
Comment 48 Tomas Mraz 2007-12-03 22:20:04 UTC
(In reply to comment #46)
> (In reply to comment #44)
> > I'd probably add a g_checksum_get_length(type) call
> > to get length of the binary hash value for given type.
> 
> g_checksum_get_digest() already accepts a pointer with the length of the
> digest.
> 
Yes, but, you'll have to unnecessarily allocate a new GChecksum to query the length. For example when you want to decide which checksum type to choose based on length.
Comment 49 Matthias Clasen 2007-12-03 22:33:50 UTC
hmm,

md5_byte_reverse does indeed look like unaligned access to me. 
It may be ok, because it is only ever called on md5->buf and md5->data,
which look like they will always be sufficiently aligned.

It is probably worth a comment in md5_byte_reverse that buffer is assumed to be
integer-aligned.


Typo: s/contains/contain/

  /* Reset buffers in case they contains sensitive data */


The docs for GChecksumType should probably mention that more types may be
added.
We probably also need some way to check for a checksum type being supported
(ie something better than g_assert_not_reached())
Comment 50 Emmanuele Bassi (:ebassi) 2007-12-04 10:51:37 UTC
(In reply to comment #48)

> > g_checksum_get_digest() already accepts a pointer with the length of the
> > digest.
> > 
> Yes, but, you'll have to unnecessarily allocate a new GChecksum to query the
> length. For example when you want to decide which checksum type to choose based
> on length.

oh, then I misunderstood what you were asking. do you want a function that returns the length of the hash size without any other argument, like:

  guint
  g_checksum_get_type_length (GChecksumType checksum_type)
  {
    switch (checksum_type)
      {
      case G_CHECKSUM_MD5:
        ...
      }
  }

if I understood correctly now? I have no strong feelings about this: for me, it makes sense to get the length of the digest at the same time I get the digest.

(In reply to comment #49)
> hmm,
> 
> md5_byte_reverse does indeed look like unaligned access to me. 
> It may be ok, because it is only ever called on md5->buf and md5->data,
> which look like they will always be sufficiently aligned.
> 
> It is probably worth a comment in md5_byte_reverse that buffer is assumed to be
> integer-aligned.

I'll add a comment.

> 
> Typo: s/contains/contain/
> 
>   /* Reset buffers in case they contains sensitive data */

and fix this as well.

> 
> The docs for GChecksumType should probably mention that more types may be
> added.

okay.

> We probably also need some way to check for a checksum type being supported
> (ie something better than g_assert_not_reached())

at the moment, if IS_VALID_TYPE() fails we get out of the g_checksum_new(). meybe making IS_VALID_TYPE public and calling it G_CHECKSUM_TYPE_IS_VALID()?
Comment 51 Tomas Mraz 2007-12-04 12:27:06 UTC
> oh, then I misunderstood what you were asking. do you want a function that
> returns the length of the hash size without any other argument, like:
> 
>   guint
>   g_checksum_get_type_length (GChecksumType checksum_type)
>   {
>     switch (checksum_type)
>       {
>       case G_CHECKSUM_MD5:
>         ...
>       }
>   }

Yes, exactly that.
Comment 52 Matthias Clasen 2007-12-04 13:47:27 UTC
> at the moment, if IS_VALID_TYPE() fails we get out of the g_checksum_new().
> meybe making IS_VALID_TYPE public and calling it G_CHECKSUM_TYPE_IS_VALID()?

Either that, or make checksum_new() return NULL if it meets an unknown type.
Comment 53 Dan Winship 2007-12-04 15:28:47 UTC
(In reply to comment #48)
> > g_checksum_get_digest() already accepts a pointer with the length of the
> > digest.
> > 
> Yes, but, you'll have to unnecessarily allocate a new GChecksum to query the
> length. For example when you want to decide which checksum type to choose based
> on length.

In all of the use cases that have been mentioned so far in this bug, you
don't actually have a choice anyway. (Eg, the thumbnail spec and Content-MD5
headers demand MD5. DBUS_COOKIE_SHA1 demands SHA-1, etc.)

If you are actually choosing what hash function to use at runtime, wouldn't
you want to choose based on some better criteria than "output string size"?
And at any rate, if you're choosing between multiple hash functions based on
any interesting criteria, you're probably doing something where you ought to
be using a real crypto library anyway.
Comment 54 Matthias Clasen 2007-12-04 16:04:11 UTC
I agree that the length of the checksum is probably not the most imporant piece of information, but we can trivally add a function to get this information later if a real use-case shows up. 

Lets get the existing patches committed first.
Comment 55 Emmanuele Bassi (:ebassi) 2007-12-04 16:37:08 UTC
thanks Matthias for the quick review.

both patches committed to trunk, with the documentation changes of comment 49

2007-12-04  Emmanuele Bassi  <ebassi@gnome.org>

        * gio/glocalfileinfo.c: Replace the copy-and-paste MD5 digest
        generation with GChecksum.

2007-12-04  Emmanuele Bassi  <ebassi@gnome.org>

        * glib/gchecksum.[ch]: Add GChecksum, a generic wrapper around
        various hashing algorithms. At the moment, the MD5, SHA-1 and
        SHA-256 algorithms are supported. (#443648)

        * glib/glib.h:
        * glib/Makefile.am:
        * glib/glib.symbols: Build glue for GChecksum

        * tests/Makefile.am
        * tests/checksum-test.c: Add test suite for GChecksum.