After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 679289 - glib should support encoding and decoding from hexadecimal
glib should support encoding and decoding from hexadecimal
Status: RESOLVED WONTFIX
Product: glib
Classification: Platform
Component: general
unspecified
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2012-07-03 00:10 UTC by Christian Hergert
Modified: 2018-02-16 11:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch implementing g_hex_encode(), g_hex_decode(). (8.51 KB, patch)
2012-07-03 00:12 UTC, Christian Hergert
reviewed Details | Review
Patch addressing Colin's review. (8.36 KB, patch)
2012-07-03 05:40 UTC, Christian Hergert
none Details | Review

Description Christian Hergert 2012-07-03 00:10:15 UTC
Occasionally one needs to encode to and decode from hexadecimal format. glib currently lacks such routines. Probably because these are quite trivial or simply hasn't come up. For example, in python, you can do "abcdef".decode("hex").

I think something like the following would be convenient for my needs.

  gchar *g_hex_encode (const guint8*data, gsize data_len);
  guint8 *g_hex_decode (const gchar *encoded, gssize encoded_len, gsize *decoded_len);
Comment 1 Christian Hergert 2012-07-03 00:12:20 UTC
Created attachment 217891 [details] [review]
patch implementing g_hex_encode(), g_hex_decode().

Please review if you think this is worthwhile.
Comment 2 Colin Walters 2012-07-03 01:50:39 UTC
Review of attachment 217891 [details] [review]:

::: glib/ghex.c
@@ +88,3 @@
+    return NULL;
+
+  decoded = g_malloc(encoded_len / 2);

I'm confused by the malloc...

@@ +92,3 @@
+  for (i = 0; i < encoded_len; i += 2)
+    {
+      if (1 != sscanf(&encoded[i], "%02x", &b))

sscanf is locale-sensitive.  If that's your intention, you need to be VERY clear about it in the API documentation.  But very likely, you only want to parse *ASCII* hex characters, in which case, see g_ascii_xdigit_value(), which this function should at least reference.
Comment 3 Colin Walters 2012-07-03 02:10:10 UTC
This bit got lost when I switched from bugzilla comments to splinter:

It would help significantly if you found two (or more) places in GNOME this would be used.

"Occasionally one needs to encode to and decode from hexadecimal format." contains essentially no information.  Something like "While parsing JSON...", "When implementing my own programming language...".

And actually, did you know you can use g_ascii_strtoull() with a base argument?
Comment 4 Colin Walters 2012-07-03 02:11:39 UTC
One other comment:

const guint8*data, gsize data_len

Should use a GBytes.
Comment 5 Christian Hergert 2012-07-03 05:39:02 UTC
Thanks for the review Colin. I have a couple uses for this immediately. I often use hex encoding to print buffers to the console when debugging/logging. Also, where I am using this immediately in my push-glib library, Apple Push Notification identifiers are hex encoded strings and I need to convert between the two.

It has been asked on StackOverflow here:

  http://stackoverflow.com/questions/7060379/glib-api-to-convert-hexadecimal-string-to-ascii-string

Is it okay to still use g_string_append_printf() with "%02x"?

New patch has a few changes:

  * g_ascii_xdigit_value() is used to parse hex encoded string instead of sscanf().
  * GBytes preferred for parameters.
  * Tests cleanup.

As for g_ascii_strtoull() might be useful for doing some sort of multi-byte parsing, but is that really worth it?
Comment 6 Christian Hergert 2012-07-03 05:40:43 UTC
Created attachment 217907 [details] [review]
Patch addressing Colin's review.
Comment 7 Christian Persch 2012-07-03 10:59:15 UTC
Since the encoded string can be quite long, wouldn't it be better to have a gsize out param on g_hex_encode so one can save some strlen() calls later on?

And it's calling g_string_append_printf() a lot, which allocates a buffer to printf two characters, appends to string, frees buffer... would be better to just append the chars directly, ie

  static const gchar hex[16] = "0123456789ABCDEF";
  end = data + size;
  while (data < end) {
    gchar c = *data++;
    g_string_append_c (str, hex[((guchar)c) >> 4]);
    g_string_append_c (str, hex[((guchar)c) & 0xf]);
  }

And since we know the size the string will have in the end, we can avoid growing it again and again by just reating it with g_string_sized_new(). Or even better yet, simply malloc 2*size+1 bytes and write directly to it without going through GString.

Now, looking at places in glib where we do a byte->hex string conversion, I think we may need a couple more functions.

gio/gcharsetconverter.c: writes one byte as hex into a provided buffer; adds '\' as escape char in front of the 2 hex digits

gio/gdummyfile.c:g_string_append_encoded() writes to a provided GString; escapes with '%', but only byte-to-hexes 'invalid'/nonprintable chars. could make use of a 'append one byte as hex into a GString' func.

gio/gfileattribute.c: similar, but escapes with '\x'

gio/glocalfile.c:escape_trash_name(): similar, escapes with '%'

gio/glocalfileinfo.c:hex_escape_string() like gfileattribute

glib/gchecksum: no escaping, but needs to take (data, len) instead of GBytes.

glib/gconvert.c:g_escape_uri_string() like gdummyfile

glib/gstring.c:g_string_append_uri_escaped() like g_escape_uri_string but using GString

So, looking at these, I think we could use at least the following additional functions:

gchar *data_to_hex (guint8 *data, gsize datalen, gsize *outlen);

void g_string_append_as_hex (GString *str, guint8 *data, gsize datalen);

gsize append_as_hex (char *buffer, gsize buflen, guint8 *data, gsize datalen);
(returning #of bytes written)

Not sure what to do with the escaping...
Comment 8 Dan Winship 2012-07-04 11:58:53 UTC
This would make sense as a GConverter, though then it couldn't be used for the glib/ cases...

Also/alternatively, the gbase64 APIs are a disaster and could perhaps be abstracted together with this in some way.


(In reply to comment #4)
> Should use a GBytes.

g_hex_encode() doesn't need to ref or unref the data, so it's nicer to have it take const guint8 *, gsize, because then you can call it with whatever data you have without needing to turn it into a GBytes first. g_hex_decode() makes more sense with a GBytes, though then the APIs are inconsistent... Shrug.
Comment 9 Colin Walters 2012-07-10 21:21:51 UTC
First, my original review was confused because I started it, got called away for dinner, then came back...in the middle I'd somehow convinced myself that this was really basically just g_ascii_strtoull(), but it's more like base64.

So there are four different cases here.

1) Escaping a byte as hex.  This is already pretty easy to do with g_ascii_xdigit_value() or even easier, g_string_append_printf(), though the latter is admittedly fairly inefficient.

2) Unescaping a byte from hex.  Already served by g_ascii_strtoull().

3) Escaping a whole blob of binary data as hex.  This is possibly useful for debugging purposes; there's _g_dbus_hexdump() in gio now.  

4) Unescaping a blob of hex data.  Up until now, no one has requested this. So what's up with Apple having some weird scheme where they have hex-encoded values.  Why aren't they using Base64?  Who knows...

I don't think we should conflate 1) with 3); they're really totally different things.

(In reply to comment #8)
>
> Also/alternatively, the gbase64 APIs are a disaster and could perhaps be
> abstracted together with this in some way.

Yeah.  Worth thinking about...but it seems like once you get sufficiently general you end up at GConverter (and GConverter{Input,Output}Stream).

> (In reply to comment #4)
> > Should use a GBytes.
> 
> g_hex_encode() doesn't need to ref or unref the data, so it's nicer to have it
> take const guint8 *, gsize, because then you can call it with whatever data you
> have without needing to turn it into a GBytes first. g_hex_decode() makes more
> sense with a GBytes, though then the APIs are inconsistent... Shrug.

Yeah, true.
Comment 10 Colin Walters 2012-07-10 21:27:50 UTC
Oh also, the patch needs to be GNU style (space between identifier and paren).
Comment 11 Stef Walter 2012-08-03 19:34:37 UTC
Cool. I'm glad to see some work on this. It was always on my list of 'things that would be handy to upstream into glib someday'.

In gnome-keyring, gcr, libsecret I have an implementation that has served me quite well:

http://git.gnome.org/browse/gcr/tree/egg/egg-hex.h
http://git.gnome.org/browse/gcr/tree/egg/egg-hex.c

Obviously no pressure to use it. But I agree with Christian Persch in comment #7 that the need for delimiters comes up pretty often when dealing with hex. As does the need to group in N bytes.

One thing I've found flawed in my implementation is the use of gchar for the delimiter. The delimiter needs to be a full string in some cases, such as the '\x' delimiter use case mentioned above.

(In reply to comment #9)
> 3) Escaping a whole blob of binary data as hex.  This is possibly useful for
> debugging purposes; there's _g_dbus_hexdump() in gio now.

Yeah this is used all over the place.

> 4) Unescaping a blob of hex data.  Up until now, no one has requested this. So
> what's up with Apple having some weird scheme where they have hex-encoded
> values.  Why aren't they using Base64?  Who knows...

I've used this when parsing openssl PEM headers, and gnupg fingerprints (in a gpg-agent implementation for example).

> I don't think we should conflate 1) with 3); they're really totally different
> things.

Agreed. The former are about integers, the later about data. Although theoretically all data is a (possibly very long) integer, we usually don't operate on integers and data with the same sets of functions.

> (In reply to comment #8)
> >
> > Also/alternatively, the gbase64 APIs are a disaster and could perhaps be
> > abstracted together with this in some way.
> 
> Yeah.  Worth thinking about...but it seems like once you get sufficiently
> general you end up at GConverter (and GConverter{Input,Output}Stream).

Hmmm, this would be overkill for the use cases in glib where the hex encoder would currently be used, no?
Comment 12 Stef Walter 2012-08-03 19:44:40 UTC
Review of attachment 217907 [details] [review]:

::: glib/tests/hex.c
@@ +33,3 @@
+test_encode (void)
+{
+#define TEST_ENCODE(d, s) \

Rather than testing multiple fixtures in a single test case, I would suggest defining the fixtures as structures and passing them to the relevant tests with g_test_add() or g_test_add_data_func(). You can run g_test_add() or g_test_add_data_func() in a loop and format testpath arguments appropriately (using the expected hex output in the testpath, for example). This makes it really clear to see what's failing, no nasty macros needed, all while still reducing repetition.

Obviously this is a trivial test case, but big macros in test cases are a pet peeve of mine, so I couldn't help but comment :)
Comment 13 Philip Withnall 2018-02-16 11:38:23 UTC
I’m not convinced we need these in GLib. The implementing for encoding things as hex is sufficiently trivial that it might be better for callers to open-code it. I’m not sure the implementation for decoding things is sufficiently general: if you’re handling network/file formats, there are typically going to be quirks in the format which mean any implementation in GLib is not going to be appropriate (as people have said, groupings, termination characters, case sensitivity).

Given that, and that this bug hasn’t seen traction for 6 years, I’m going to WONTFIX it. If people can give a good argument why it is appropriate to have this in GLib, please reopen the bug! I’m leaning towards being happier to accept a simple encoding function for debug output, and not accepting a decoding function.