GNOME Bugzilla – Bug 770971
Add a structured g_log() that works with introspection
Last modified: 2016-09-20 04:59:15 UTC
As stated in https://mail.gnome.org/archives/desktop-devel-list/2016-August/msg00002.html
Created attachment 334947 [details] [review] [WIP] Proposed patch This is a WIP that actually works. It's missing doc and test. Please review
Review of attachment 334947 [details] [review]: nice, thanks for working on this. ::: glib/gmessages.c @@ +1648,3 @@ + field.value = g_variant_dup_string (value, NULL); + else if (g_variant_is_of_type (value, G_VARIANT_TYPE_BYTESTRING)) + field.value = g_variant_dup_bytestring (value, NULL); I think you need to pass a size variable here instead of NULL and assign it to field.length. @@ +1652,3 @@ + field.value = g_variant_print (value, FALSE); + + field.length = -1; then this will have to go up top, so it doesn't override.
(In reply to Ray Strode [halfline] from comment #2) > Review of attachment 334947 [details] [review] [review]: > > nice, thanks for working on this. > > ::: glib/gmessages.c > @@ +1648,3 @@ > + field.value = g_variant_dup_string (value, NULL); > + else if (g_variant_is_of_type (value, G_VARIANT_TYPE_BYTESTRING)) > + field.value = g_variant_dup_bytestring (value, NULL); > > I think you need to pass a size variable here instead of NULL and assign it > to field.length. If we pass -1, a string is assumed and a strlen() is done. Otherwise it uses the length but assume data is binary. (See g_log_writer_journald() in this same file) WRT g_variant_get_bytestring(), it's assumed to always return a nul-terminated string. > @@ +1652,3 @@ > + field.value = g_variant_print (value, FALSE); > + > + field.length = -1; > > then this will have to go up top, so it doesn't override.
Review of attachment 334947 [details] [review]: Thanks for the patch! Sorry I didn’t manage to get round to writing it myself yet. ::: glib/gmessages.c @@ +1607,3 @@ +{ + GLogField *field = (GLogField *)data; + g_free((char *)field->key); Nitpick: GLib’s coding style puts a space before `(` in function calls, and after `)` in casts. @@ +1643,3 @@ + while (g_variant_iter_loop (&iter, "{&sv}", &key, &value)) + { + field.key = g_strdup(key); You could avoid the `g_strdup()` here by dropping the `&` from the `g_variant_iter_loop()` format. @@ +1648,3 @@ + field.value = g_variant_dup_string (value, NULL); + else if (g_variant_is_of_type (value, G_VARIANT_TYPE_BYTESTRING)) + field.value = g_variant_dup_bytestring (value, NULL); I don’t think `g_variant_dup_bytestring()` is the right function to call here — it will chop off anything after the first nul byte. If the user wants that behaviour, they should use a string GVariant. I think you should be using `g_variant_get_fixed_array()` instead and, as Ray says, setting `field.length` appropriately.
(In reply to Philip Withnall from comment #4) > You could avoid the `g_strdup()` here by dropping the `&` from the > `g_variant_iter_loop()` format. That was my first thought, too, and I actually had the same suggestion written out but quickly rechecked the docs, and I ultimately decided I don't think it's right. The problem is arguments to g_variant_iter_loop are in-out arguments. Next iteration of the loop, the values from the previous iteration get freed. So using & manages to avoid an internal strdup(), but I don't think skipping the & will let you avoid the external strdup(). Maybe the right answer is to forgo g_variant_iter_loop entirely and use g_variant_iter_next instead. > I don’t think `g_variant_dup_bytestring()` is the right function to call > here — it will chop off anything after the first nul byte. If the user wants > that behaviour, they should use a string GVariant. > > I think you should be using `g_variant_get_fixed_array()` instead and, as > Ray says, setting `field.length` appropriately. yea, indeed, I didn't realize g_variant_dup_bytestring stopped at the first embedded NUL, g_variant_get_fixed_array() is the right way to go.
(In reply to Ray Strode [halfline] from comment #5) > (In reply to Philip Withnall from comment #4) > > You could avoid the `g_strdup()` here by dropping the `&` from the > > `g_variant_iter_loop()` format. > That was my first thought, too, and I actually had the same suggestion > written out but quickly rechecked the docs, and I ultimately decided I don't > think it's right. > > The problem is arguments to g_variant_iter_loop are in-out arguments. Next > iteration of the loop, the values from the previous iteration get freed. That would be very, very odd; 's' maps to g_variant_dup_string(), which means that the returned value is for you to free — and if you don't, you leak it.
Could be I'm misunderstanding these docs: "On the second and later calls, it is assumed that the same pointers will be given and that they will point to the memory as set by the previous call to this function. This allows the previous values to be freed, as appropriate." … "If you break out of a such a while loop using g_variant_iter_loop() then you must free or unreference all the unpacked values as you would with g_variant_get(). Failure to do so will cause a memory leak."
Yep, that's correct. First time I was using {sv} then I changed to {&sv} because of that. To avoid a second dupe. I have your suggestions applied and I'm working on writing a test for it. Should upload a v2 patch in a couple of minutes.
(In reply to Ray Strode [halfline] from comment #7) > Could be I'm misunderstanding these docs: > > "On the second and later calls, it is assumed that the same pointers will be > given and that they will point to the memory as set by the previous call to > this function. This allows the previous values to be freed, as appropriate." > … > "If you break out of a such a while loop using g_variant_iter_loop() then > you must free or unreference all the unpacked values as you would with > g_variant_get(). Failure to do so will cause a memory leak." No, you’re right Ray. g_variant_iter_loop() does have in/out arguments and does free them on the next iteration. I always forget that. I think {sv} would work if you used g_steal_pointer() to grab the key and set it to NULL, to avoid the g_strdup().
I think the best idea is to just avoid g_variant_iter_loop entirely, there are probably some situations where it's easier than g_variant_iter_next but for this case, g_variant_iter_loop doesn't seem like a good fit.
(In reply to Ray Strode [halfline] from comment #10) > I think the best idea is to just avoid g_variant_iter_loop entirely, there > are probably some situations where it's easier than g_variant_iter_next but > for this case, g_variant_iter_loop doesn't seem like a good fit. I still don't see the difference in this context: g_variant_iter_loop with {&sv}: we get the original string and dupe it; don't need to care about second variable ('v') g_variant_iter_next with {sv}: we get the string and use it, no need to dupe because it's already done for us; need to unref the second variable myself so, am I missing something? why one approach is better than the other?
There’s basically no difference, as you say. I mistakenly thought we could eliminate the g_strdup() line, but was wrong. So you could try Ray’s suggestion of using g_variant_iter_next() and see if that simplifies things; or leave them as they are.
well it's subjective, of course, but my point is g_variant_iter_loop has non-conventional memory management. It confused every single one of us on this report. Why confuse future readers of the code, too? Yea, you have to do the unref, but it's explicit and clear. When a casual reader reads it, they'll know why it's there immediately. Anyway, again, it's subjective, so I'm not going to push super hard, but I know which way I'd go!
(In reply to Ray Strode [halfline] from comment #13) > well it's subjective, of course, but my point is g_variant_iter_loop has > non-conventional memory management. It confused every single one of us on > this report. Why confuse future readers of the code, too? > > Yea, you have to do the unref, but it's explicit and clear. When a casual > reader reads it, they'll know why it's there immediately. > > Anyway, again, it's subjective, so I'm not going to push super hard, but I > know which way I'd go! You convinced me :) Indeed g_variant_iter_loop() it's a bit confusing, non-usual.
Created attachment 335006 [details] [review] [WIP] Proposed patch v2 Fixed according review; Added tests
Created attachment 335007 [details] [review] diff from v1 patch
Review of attachment 335006 [details] [review]: ::: glib/gmessages.c @@ +1606,3 @@ +destroy_field (gpointer data) +{ + GLogField *field = (GLogField *)data; space @@ +1627,3 @@ + g_array_set_clear_func (fields, destroy_field); + + field.key = g_strdup("PRIORITY"); space @@ +1628,3 @@ + + field.key = g_strdup("PRIORITY"); + field.value = g_strdup(log_level_to_priority (log_level)); space @@ +1634,3 @@ + if (log_domain) + { + field.key = g_strdup("GLIB_DOMAIN"); space @@ +1635,3 @@ + { + field.key = g_strdup("GLIB_DOMAIN"); + field.value = g_strdup(log_domain); space @@ +1650,3 @@ + else if (g_variant_is_of_type (value, G_VARIANT_TYPE_BYTESTRING)) + { + gconstpointer p = g_variant_get_fixed_array (value, (gsize *) &field.length, sizeof (guchar)); In theory, field.length could overflow. Probably better to use a temporary, and make sure the size won't go negative. (might also prevent aliasing warnings on some compilers, not sure?) ::: glib/gvarianttype.h @@ -26,3 @@ #endif -#include <glib/gmessages.h> i know it's a small change, but personally, i'd do this as a separate commit, since it's a clean up in its own right.
Created attachment 335012 [details] [review] remove the include
Created attachment 335013 [details] [review] [Final?] Proposed patch Proposed final version - Adds the doc - Fixes the issues pointed on review - Rename some variables to improve readability and doc
Review of attachment 335013 [details] [review]: ::: glib/gmessages.c @@ +1620,3 @@ + * + * Log a message with structured data, accepting the data within a #GVariant. This + * version is specially useful for use in other languages, via introspection. especially @@ +1622,3 @@ + * version is specially useful for use in other languages, via introspection. + * + * For details on its usage and about the parameters, see #g_log_structured(). it's a little unfortunately that we're pointing to non-introspectable C convenience api, since that means generated docs for other languages will point to a non-existent function. But that problem is already there in the current code. I think at a minimum the docs for this function should mention that MESSAGE is required item in the dictionary, and we should probably fail if we get to the bottom of the function and its not there. @@ +1640,3 @@ + g_return_if_fail (g_variant_is_of_type (fields, G_VARIANT_TYPE_VARDICT)); + + a_fields = g_array_sized_new (FALSE, FALSE, sizeof (GLogField), 16); The g_array_sized_new(..., 16) is kind of arbitrary, and you end up with that same arbitrary value if you just use g_array_new (the first time you call g_array_append_val it will allocate 16 entries). I think it'd be better to avoid adding one more magic number in the code and just use g_array_new @@ +1668,3 @@ + gsize s; + gconstpointer p = g_variant_get_fixed_array (value, &s, sizeof (guchar)); + field.length = MIN (s, G_MAXSSIZE); is silently truncating the right way to go? i'm not sure. If we're going to warn/fail for no MESSAGE, we might want to warn/fail for this case too? Regardless, it would be good to document the behavior. of course this is a crazy edge case, so maybe it doesn't actually matter
(In reply to Ray Strode [halfline] from comment #20) > Review of attachment 335013 [details] [review] [review]: > > ::: glib/gmessages.c > @@ +1620,3 @@ > + * > + * Log a message with structured data, accepting the data within a > #GVariant. This > + * version is specially useful for use in other languages, via > introspection. > > especially > > @@ +1622,3 @@ > + * version is specially useful for use in other languages, via > introspection. > + * > + * For details on its usage and about the parameters, see > #g_log_structured(). > > it's a little unfortunately that we're pointing to non-introspectable C > convenience api, since that means generated docs for other languages will > point to a non-existent function. But that problem is already there in the > current code. > > I think at a minimum the docs for this function should mention that MESSAGE > is required item in the dictionary, and we should probably fail if we get to > the bottom of the function and its not there. OK, I could add this to the doc, but strictly speaking it's not required. At least currently. The underlying function - g_log_structured_array() - doesn't check for it. And it's a public API. Should we fix this in a new bug? :) > @@ +1640,3 @@ > + g_return_if_fail (g_variant_is_of_type (fields, G_VARIANT_TYPE_VARDICT)); > + > + a_fields = g_array_sized_new (FALSE, FALSE, sizeof (GLogField), 16); > > The g_array_sized_new(..., 16) is kind of arbitrary, and you end up with > that same arbitrary value if you just use g_array_new (the first time you > call g_array_append_val it will allocate 16 entries). > > I think it'd be better to avoid adding one more magic number in the code and > just use g_array_new ACK. > @@ +1668,3 @@ > + gsize s; > + gconstpointer p = g_variant_get_fixed_array (value, &s, sizeof > (guchar)); > + field.length = MIN (s, G_MAXSSIZE); > > is silently truncating the right way to go? i'm not sure. If we're going to > warn/fail for no MESSAGE, we might want to warn/fail for this case too? > Regardless, it would be good to document the behavior. of course this is a > crazy edge case, so maybe it doesn't actually matter Indeed, it's a bit odd to have such a high number of fields :) Anyway, I could print something on the stderr...
Created attachment 335038 [details] [review] [Final?] Proposed patch v2 Better doc Fixes according review
Review of attachment 335038 [details] [review]: so i'm happy with this now for the most part. Thanks again for tackling this. I don't know if Phillip or desrt or ebassi have any further comments. I do have a few suggestions, but they're totally just ideas, nothing really needed. imo. ::: glib/gmessages.c @@ +1678,3 @@ + gsize s; + gconstpointer p = g_variant_get_fixed_array (value, &s, sizeof (guchar)); + if (G_LIKELY (s < G_MAXSSIZE)) should be <= @@ +1682,3 @@ + else + { + _g_fprintf (stderr, "Byte array too large (%lu bytes) passed to g_log_variant(). Truncating to %ld", s, G_MAXSSIZE); I think you need to cast s to (gulong) and G_MAXSSIZE to (glong) or it might give a warning on windows? Chatting with Matthias about truncated messages, he had one idea: You could add an extra field GLIB_TRUNCATED_LOG_FIELDS that gave a list of ones that ended up truncated. anyway, maybe overkill for an edge case. @@ +1691,3 @@ + + g_array_append_val (a_fields, field); + g_variant_unref (value); So, this code does a lot of duping, one improvement might be to forgo the g_variant_unref here and stuff the values on a GSList and free them in bulk at the end. Then you can skip some of the duping, like the the potentially large memdup above (say someone was logging a qr code image)
(In reply to Ray Strode [halfline] from comment #23) > Review of attachment 335038 [details] [review] [review]: > > so i'm happy with this now for the most part. Thanks again for tackling > this. I don't know if Phillip or desrt or ebassi have any further comments. > I do have a few suggestions, but they're totally just ideas, nothing really > needed. imo. > > ::: glib/gmessages.c > @@ +1678,3 @@ > + gsize s; > + gconstpointer p = g_variant_get_fixed_array (value, &s, sizeof > (guchar)); > + if (G_LIKELY (s < G_MAXSSIZE)) > > should be <= fixed. > @@ +1682,3 @@ > + else > + { > + _g_fprintf (stderr, "Byte array too large (%lu bytes) passed > to g_log_variant(). Truncating to %ld", s, G_MAXSSIZE); > > I think you need to cast s to (gulong) and G_MAXSSIZE to (glong) or it might > give a warning on windows? Can I blindly change this? Are you sure? Is there any way I can build it on Windows in order to see if there's any warning? > Chatting with Matthias about truncated messages, he had one idea: You could > add an extra field GLIB_TRUNCATED_LOG_FIELDS that gave a list of ones that > ended up truncated. anyway, maybe overkill for an edge case. Actually it's not the fields that are truncated. It happens when a field of type 'ay' is higher than MAX_LONG. In other words, you can have as many fields you want, since each one is of size MAX_LONG at most. > @@ +1691,3 @@ > + > + g_array_append_val (a_fields, field); > + g_variant_unref (value); > > So, this code does a lot of duping, one improvement might be to forgo the > g_variant_unref here and stuff the values on a GSList and free them in bulk > at the end. Then you can skip some of the duping, like the the potentially > large memdup above (say someone was logging a qr code image) Indeed. I've refactored the patch to not dupe in almost all cases. The new allocation now is when we use g_variant_print().
Created attachment 335136 [details] [review] [Final?] Proposed patch v3 Now we avoid allocating new memory. Ref the variant parameter on the beggining and unref on the end so that we can safely use {&sv} in g_variant_iter_next() Store all variants ('v' in {&sv}) in a list and unref them at the end. The only case where we allocate new memory is when the value in the dictionary is not a string neither an array of bytes. Then we use g_variant_print(). I did some benchmarking with the old version and with the new one, for a large number of binary data. On my machine, execution time went down to 0.8s from 1.2s. Source for this is at https://github.com/jwendell/temp/tree/master/test-log-variant
(In reply to Jonh Wendell from comment #24) > Actually it's not the fields that are truncated. It happens when a field of > type 'ay' is higher than MAX_LONG. In other words, you can have as many > fields you want, since each one is of size MAX_LONG at most. Right, for instance, he was just saying if e.g., three fields "CORRUPT_COREDUMP", "LARGE_SPARSE_FILE", and "EXTREMELY_LARGE_TIF_IMAGE" all were too large to fit, then you would add a new field "GLIB_TRUNCATED_LOG_FIELDS" with the value "CORRUPT_COREDUMP,LARGE_SPARSE_FILE,EXTREMLY_LARGE_TIF_IMAGE" So someone looking at the journal output knows the values are incomplete. Honestly, I wouldn't do it, I don't think it's worth the advantage it gives.
Review of attachment 335136 [details] [review]: oh cool, you're going for extra credit, awesome :-) ::: glib/gmessages.c @@ +1664,3 @@ + while (g_variant_iter_next (&iter, "{&sv}", &key, &value)) + { + values_list = g_slist_append (values_list, value); should be g_slist_prepend, so it doesn't have to iterate the whole list to add a new entry. (order doesn't matter here anyway) @@ +1686,3 @@ + char *s = g_variant_print (value, FALSE); + field.value = s; + print_list = g_slist_append (print_list, s); Again, should be g_slist_prepend. One other thing you could do is call g_slist_prepend (values_list, value) at the end of the loop instead of the beginning, and only if some boolean defer_unref == TRUE (or something like that), then defer_unref = TRUE in all cases but the g_variant_print case. If defer_unref == FALSE then you would unref right away. The main advantage of that approach is you get to free the variants you're finished with right away and recoup some of the memory back up front.
(In reply to Jonh Wendell from comment #25) > Ref the variant parameter on the beggining and unref on the end so that we > can safely use {&sv} in g_variant_iter_next() Can you explain this to me? Is the concern that the fields variant may get unreffed from another thread? If so, I don't think ref/unref actually fixes that race. That's just a "don't do that" scenario. Or is there another reason, and i'm just missing it?
(In reply to Ray Strode [halfline] from comment #28) > (In reply to Jonh Wendell from comment #25) > > Ref the variant parameter on the beggining and unref on the end so that we > > can safely use {&sv} in g_variant_iter_next() > Can you explain this to me? Is the concern that the fields variant may get > unreffed from another thread? If so, I don't think ref/unref actually fixes > that race. That's just a "don't do that" scenario. Or is there another > reason, and i'm just missing it? You are right, this is not necessary.
(In reply to Jonh Wendell from comment #24) > Can I blindly change this? Are you sure? Is there any way I can build it on > Windows in order to see if there's any warning? Well, the problem is pointer size isn't guaranteed to be the same size as long. I believe on windows, long is 32-bit for backward compatibility with win32, and long long is used for pointer size. I don't know about getting access to a windows build environment, to see for yourself. I'm not sure. If you don't want to use the casts, the other way is G_GSIZE_FORMAT and G_GSSIZE_FORMAT. e.g., "Byte array too large (%" G_GSIZE_FORMAT " bytes) passed "
Created attachment 335138 [details] [review] [Final?] Proposed patch v4 Added the defer_unref logic Fixed printf format Dropped unnecessary ref/unref of the function parameter
Review of attachment 335138 [details] [review]: looks good to me. I'm going to add more nits, but I don't think they matter much at all, i just noticed them now, so I'll throw them out there. It would be good if Phillip rubber stamped this too before you pushed. ::: glib/gmessages.c @@ +1637,3 @@ + GVariant *value; + gchar *key; + GArray *a_fields; i personally like the name fields_array better than a_fields, but that's just my own subjective style. It would be more consistent with your use of _list suffixes below in your list variables. @@ +1640,3 @@ + GLogField field; + GSList *values_list, *print_list; + gboolean defer_unref; defer_unref could be declared in the while loop scope for clarity @@ +1675,3 @@ + field.value = g_variant_get_fixed_array (value, &s, sizeof (guchar)); + if (G_LIKELY (s <= G_MAXSSIZE)) + field.length = s; pretty sure glib coding style requires braces on both sides of an if/else if one side has it. (see docs/CODING-STYLE in the gtk tree, which I think glib follows, too) @@ +1681,3 @@ + "Byte array too large (%" G_GSIZE_FORMAT " bytes) " + "passed to g_log_variant(). Truncating to %"G_GSSIZE_FORMAT, + s, G_MAXSSIZE); It occurred to me you could have _g_fprintf do a little less work by changing: …Truncating to %" G_SSIZE_FORMAT, …G_MAXSSIZE); to …Truncating to " G_STRINGIFY (G_MAXSSIZE), …); That would have the preprocessor handle that bit at compile time, instead of handling it at run time.
Created attachment 335141 [details] [review] [Final?] Proposed patch v5 Fixes according latest review
Review of attachment 335141 [details] [review]: ::: glib/gmessages.c @@ +1667,3 @@ + field.length = -1; + + if (g_variant_is_of_type (value, G_VARIANT_TYPE_STRING)) missing braces here, but you could fix that when pushing, probably don't need to post the patch again. (still hang tight for a response from pwithnall)
(In reply to Ray Strode [halfline] from comment #34) > Review of attachment 335141 [details] [review] [review]: > > ::: glib/gmessages.c > @@ +1667,3 @@ > + field.length = -1; > + > + if (g_variant_is_of_type (value, G_VARIANT_TYPE_STRING)) > > missing braces here, but you could fix that when pushing, probably don't > need to post the patch again. (still hang tight for a response from > pwithnall) ok, we need a tool for checking the code style :)
Review of attachment 335141 [details] [review]: Seems I missed the party, sorry. A few minor nitpicks, then I think this is ready to go. Thanks! ::: glib/gmessages.c @@ +1608,3 @@ + * @log_level: log level, either from #GLogLevelFlags, or a user-defined + * level + * @fields: A dictionary (#GVariant of the type %G_VARIANT_TYPE_VARDICT) Nitpick: the ‘A’ should be lower case to match the other argument documentation @@ +1620,3 @@ + * (#G_VARIANT_TYPE_STRING). Array of bytes (#G_VARIANT_TYPE_BYTESTRING) is also + * supported. In this case the message is handled as binary and will be forwarded + * to the journal as such. The size of the array should not be higher than %G_MAXSSIZE. We shouldn’t assume that messages are being sent to systemd-journald. It’s going to be true in 90% of cases, but not always. Say “and will be forwarded to the log writer as such”. @@ +1621,3 @@ + * supported. In this case the message is handled as binary and will be forwarded + * to the journal as such. The size of the array should not be higher than %G_MAXSSIZE. + * Otherwise it will be truncated to this size. For other types #g_variant_print() Use ‘g_variant_print()’ (no leading hash). gtk-doc syntax is ‘#TypeName’ or ‘function_name()’, but not both. @@ +1624,3 @@ + * will be used to convert the value into a string. + * + * For more details on its usage and about the parameters, see #g_log_structured(). Same here. ::: glib/tests/logging.c @@ -536,3 @@ } - - guint some_integer = 123; This should really be removed in a separate commit. Well spotted and thanks for the cleanup.
Created attachment 335179 [details] [review] remove an unsed variable in tests
Created attachment 335180 [details] [review] [Final?] Proposed patch v6 Fixed according latest review; Split the removal of unused variable into another commit
ok to push?
do it! and thanks again. Are you going to do the gjs patch, too?
sure I will, after all that's the reason we are here in this bug :D thanks guys for your reviews and patience :)
Was writing a logging.Handler implementation for Python and it appears its not possible to bypass ERROR being fatal from pygobject. Just wondering if I am missing something.
(In reply to Patrick Griffis (tingping) from comment #42) > Was writing a logging.Handler implementation for Python and it appears its > not possible to bypass ERROR being fatal from pygobject. Just wondering if I > am missing something. Correct, any log level in G_LOG_FATAL_MASK (including G_LOG_LEVEL_ERROR) is always fatal. The ability to change the fatal mask (using g_log_set_fatal_mask()) has been removed because it broke everyone's assumptions about what was and was not fatal.