GNOME Bugzilla – Bug 741779
Documentation tweaks addressing real-world API misuses
Last modified: 2015-08-19 11:57:34 UTC
Here’s a series of patches tweaking various bits of documentation which I’ve seen misused in real-world code which uses GLib. Hopefully by tweaking the documentation we can reduce the prevalence of this in new code. Each commit should give a rationale for its change; let me know if anything’s unclear.
Created attachment 293070 [details] [review] gparamspecs: Recommend use of most specific GParamSpec types It’s quite common to see a g_param_spec_pointer() used for GObject or boxed types which, while not incorrect, does make memory management unsafe, since no copying or reference counting can be performed automatically. Similarly, people often use g_param_spec_boolean() when an enum would be more appropriate, cf. http://blog.ometer.com/2011/01/20/boolean-parameters-are-wrong/ Using enums also means that the set of allowable values can be extended in future if needed. In the hope that people who write code like that read the documentation, mention the more specific types in the documentation.
Created attachment 293071 [details] [review] ghash: Document that g_hash_get_[keys|values]() are expensive And definitely not the right way to iterate over a hash table (as seen in code in the wild).
Created attachment 293072 [details] [review] glist: Clarify how g_list_free_1() handles links It doesn’t, which is fine, but could be unexpected if undocumented.
Created attachment 293073 [details] [review] glist: Clarify that g_list_nth() is expensive Just in case people have forgotten their basic algorithms course. Seen in some pretty terrible code in the wild; hopefully mentioning the cost in the documentation will make people think twice about using a counter variable when iterating over a linked list.
Created attachment 293074 [details] [review] glist: Document how to check a GList for emptiness Don’t use g_list_length(). It can be expensive.
Created attachment 293075 [details] [review] gmem: Clarify that a NULL check is not needed before calling g_free() It was documented before, but wasn’t especially clear. Doing if (X) g_free (X); is apparently quite a pervasive real-world anti-pattern, so perhaps it could be documented more explicitly.
Created attachment 293076 [details] [review] gobject: Mention g_clear_object() in g_object_unref() documentation
Created attachment 293077 [details] [review] gmain: Document memory management best practices for GSources It’s very common to see code where a timeout is scheduled using g_timeout_add(), yet the owning object could be destroyed shortly afterwards, before the timeout is fired, leading to use-after-free. Try and prevent this happening with new code by documenting best practices for memory management of user data for GSource callbacks.
Created attachment 293078 [details] [review] gsignal: Document memory management best practices for signal handlers It’s quite common to see naked g_signal_connect() calls without a paired g_signal_handler_disconnect(). This is commonly a bug which could lead to uses of the callback user data after it’s been freed. Document the best practices for avoiding this kind of bug by properly disconnecting all signal handlers.
Created attachment 293079 [details] [review] gmessages: Mention g_return_if_fail() in g_warning() and g_error() docs It seems to be common for people to use g_warning() or g_error() as pre- and post-condition error reporting functions, which is not really what they’re intended for. Similarly, it is generally a sign of bad API design to use g_warning() to report errors — use GError instead. Try and suggest this to the user in the hope that nice code results.
Created attachment 293080 [details] [review] gmain: Explicitly document the threading behaviour of g_timeout_add() i.e. That calling g_timeout_add() from a thread other than the main one probably doesn’t do what you want. Same for g_idle_add() and the *_full() variants.
Created attachment 293081 [details] [review] gthread: Clarify that GLib functions are not thread-safe Document the convention that nothing’s thread safe unless it says it is.
Created attachment 293082 [details] [review] gthread: Suggest using *_async() functions instead of threads It’s unfortunately common to see worker threads being spawned all over the place to do operations which could be brought into the main thread with an async call, simplifying everything.
Created attachment 293083 [details] [review] gvariant: Use ‘UTF-8’ in docs rather than ‘utf8’ Nitpicky correction.
Created attachment 293084 [details] [review] gvariant: Clarify that nullable strings should use maybe types Otherwise people might try to encode a NULL string as "NULL". I’m not even kidding.
Created attachment 293085 [details] [review] gerror: Minor clarifications to the GError documentation • Clarify that GError** parameters are for the return of _newly allocated_ GError*s. • Clarify that errors may need to be checked for explicitly if the return value of a function doesn’t reliably indicate them. • Give an example of a post-condition assertion on return values and error outputs.
Review of attachment 293070 [details] [review]: ::: gobject/gparamspecs.c @@ +1687,3 @@ * Creates a new #GParamSpecBoolean instance specifying a %G_TYPE_BOOLEAN + * property. In many cases, it may be more appropriate to use an enum with + * g_param_spec_enum(), both to improve code clarity by using explicitly named Interesting general purpose advice.... ...but not really harmful, I guess. @@ +2331,3 @@ * Creates a new #GParamSpecPointer instance specifying a pointer property. + * Where possible, it is better to use g_param_spec_object() or + * g_param_spec_boxed() to expose memory management information. That one is quite useful.
Review of attachment 293071 [details] [review]: Feel free to push with the minor change. ::: glib/ghash.c @@ +1703,3 @@ * + * This iterates over every entry in the hash table to build its return value. + * To iterate over the entries in a #GHashTable efficiently, use a "more efficiently" It's still linear... @@ +1746,3 @@ * + * This iterates over every entry in the hash table to build its return value. + * To iterate over the entries in a #GHashTable efficiently, use a same @@ +1789,3 @@ * + * This iterates over every entry in the hash table to build its return value. + * To iterate over the entries in a #GHashTable efficiently, use a same
Review of attachment 293072 [details] [review]: Commit after the requested change (or upload new text for further debate). ::: glib/glist.c @@ +191,1 @@ * It is usually used after g_list_remove_link(). Maybe more like: take out "so leaves the list in an inconsistent state" and put in "so you should not call this function on a element that is currently part of a list".
Review of attachment 293073 [details] [review]: ::: glib/glist.c @@ +734,3 @@ + * This iterates over the list until it reaches the @n-th position. To iterate + * over every element in a list, use a for-loop as described in the #GList + * introduction. The logic doesn't really follow here unless I know that I am already dealing with a specific misuse of this API. A minor tweak to make the logic a bit more theoretical: replacing "To iterate over every element, use..." with "If you intend to iterate over every element, it is better to use..."
Review of attachment 293074 [details] [review]: ::: glib/glist.c @@ +959,3 @@ + * of items. + * + * To check whether a list is empty, simply compare the #GList pointer with "Don't use this function to check if the list is empty. To do that, simply compare the #GList pointer with %NULL."
Review of attachment 293075 [details] [review]: ::: glib/gmem.c @@ +182,3 @@ * * Frees the memory pointed to by @mem. + * If @mem is %NULL it simply returns, so there is no need to check @mem Can you insert a para break while you're at it?
Review of attachment 293076 [details] [review]: ::: gobject/gobject.c @@ +3060,3 @@ + * + * If the pointer to the #GObject may be reused in future (e.g., if it is a + * private member of another object), it is recommended to clear the pointer to "instance variable" instead of "private member"?
Review of attachment 293077 [details] [review]: ::: glib/gmain.c @@ +201,3 @@ + * invoked while the object is still alive. + * + * The second object is to hold a strong reference to the object in the s/object/option/ @@ +202,3 @@ + * + * The second object is to hold a strong reference to the object in the + * callback, and to release it in the callback’s #GDestroyNotify. This ensures "release it in the callback's #GDestroyNotify" is a bit confusing. Maybe explicit mention the _full() variants and the purpose of the extra parameter. @@ +203,3 @@ + * The second object is to hold a strong reference to the object in the + * callback, and to release it in the callback’s #GDestroyNotify. This ensures + * that the object is kept alive until after the callback is invoked. One s/after the callback is invoked/after the source is finalized/ This may be different in case the sourcefunc returns TRUE... @@ +1591,3 @@ * parameter. + * + * See [memory management of sources][mainloop-memory-management] for details Thanks for adding all these x-refs. Would have been very easily overlooked otherwise.
Review of attachment 293078 [details] [review]: You miss the closure invalidation approach (ie: g_signal_connect_object). ::: gobject/gsignal.c @@ +102,3 @@ + * + * It is typically a program bug to call g_signal_connect() without a paired + * g_signal_handler_disconnect() later in the program. While signal handlers are g_signal_handler_disconnect() or many equivalents...
Review of attachment 293079 [details] [review]: I don't think this adds much. I also think that we could use a more comprehensive cleanup of these. Bug 741026 has a patch that shows the direction that I'm going with that thought...
Review of attachment 293080 [details] [review]: OK.
Review of attachment 293081 [details] [review]: This isn't really true, by the usual definition of what people think when they see "threadsafe". When you consider how all of the "threadsafe" API variants in POSIX work, they all effectively add a pointer to a buffer or context object in order to avoid sharing global state. That looks quite a lot like how almost everything in GLib works. The functions that we do have that work on global state _do_ tend to be threadsafe. I think we need a better explanation that covers this idea in more detail.
Review of attachment 293082 [details] [review]: ::: glib/gthread.c @@ +146,3 @@ + * A common use for #GThreads is to move a long-running blocking operation out + * of the main thread and into a worker thread. For GLib functions, such as + * GIO operations, this is not necessary, and complicates the code. Instead, complicates the code [citation needed] Using a worker thread with a bunch of sync operations is almost always easier to understand than async... @@ +796,3 @@ * If the thread can not be created the program aborts. See * g_thread_try_new() if you want to attempt to deal with failures. + * Alternatively, if potentially running multiple operations, see #GThreadPool. split to its own para and remove "Alternatively, " In fact, maybe reword it to be more like "If you are using threads for the purpose of offloading (potentially many) short-lived tasks, #GThreadPool might be useful."
Review of attachment 293083 [details] [review]: sure
Review of attachment 293084 [details] [review]: ::: glib/gvariant.c @@ +1219,3 @@ * + * @string must be valid UTF-8, and must not be %NULL. To encode + * potentially-%NULL strings, use this with g_variant_new_maybe(). The suggestion of "use this with g_variant_new_maybe()" is not great because the code ends up not being very nice: if (str) return g_variant_new_maybe (NULL, g_variant_new_string (str)); else return g_variant_new_maybe (G_VARIANT_TYPE_STRING, NULL); or GVariant *value; if (str) value = g_variant_new_string (str); else value = NULL; return g_variant_new_maybe (G_VARIANT_TYPE_STRING, value); Two choices; - either say "you must use a maybe type" and leave it at that - direct them at g_variant_new("ms")
Review of attachment 293085 [details] [review]: ::: glib/gerror.c @@ +53,3 @@ * * Functions that can fail take a return location for a #GError as their + * last argument. On error, a new #GError instance will be allocated and this part is OK @@ +164,3 @@ * If the sub-function does not indicate errors other than by + * reporting a #GError, or if its return value does not reliably indicate + * errors, you need to create a temporary #GError This seems redundant with what is already there. If your intent is to clarify, then add a small parenthetical expression instead. @@ +347,3 @@ + * - Similarly, you may want to add a check at the bottom of your function + * that the return value matches whether the error is set (e.g. + * `g_return_val_if_fail (error == NULL || (retval == NULL) == (*error != NULL));`). yuck. do not want. consider multiple return paths, as one example of why not...
Patch #293074 has actually already been obsoleted by the fix for bug #741024. Pushed the other a-c_n patches with the suggested modifications. Will update the other patches shortly. Attachment 293070 [details] pushed as c639b62 - gparamspecs: Recommend use of most specific GParamSpec types Attachment 293071 [details] pushed as 59748c3 - ghash: Document that g_hash_get_[keys|values]() are expensive Attachment 293072 [details] pushed as c6312da - glist: Clarify how g_list_free_1() handles links Attachment 293073 [details] pushed as 984576c - glist: Clarify that g_list_nth() is expensive Attachment 293075 [details] pushed as 18c9a4e - gmem: Clarify that a NULL check is not needed before calling g_free() Attachment 293076 [details] pushed as 4aedc85 - gobject: Mention g_clear_object() in g_object_unref() documentation Attachment 293080 [details] pushed as 8df6e5f - gmain: Explicitly document the threading behaviour of g_timeout_add() Attachment 293083 [details] pushed as 2a581fa - gvariant: Use ‘UTF-8’ in docs rather than ‘utf8’
(In reply to Ryan Lortie (desrt) from comment #26) > Review of attachment 293079 [details] [review] [review]: > > I don't think this adds much. > > I also think that we could use a more comprehensive cleanup of these. Bug > 741026 has a patch that shows the direction that I'm going with that > thought... I could remove the bit about g_return_if_fail(), but I think the mention of GError is valuable.
Created attachment 298468 [details] [review] gmain: Document memory management best practices for GSources It’s very common to see code where a timeout is scheduled using g_timeout_add(), yet the owning object could be destroyed shortly afterwards, before the timeout is fired, leading to use-after-free. Try and prevent this happening with new code by documenting best practices for memory management of user data for GSource callbacks.
Created attachment 298469 [details] [review] gsignal: Document memory management best practices for signal handlers It’s quite common to see naked g_signal_connect() calls without a paired g_signal_handler_disconnect(). This is commonly a bug which could lead to uses of the callback user data after it’s been freed. Document the best practices for avoiding this kind of bug by properly disconnecting all signal handlers.
Created attachment 298470 [details] [review] gthread: Suggest using *_async() functions instead of threads It’s unfortunately common to see worker threads being spawned all over the place to do operations which could be brought into the main thread with an async call, simplifying everything.
Created attachment 298471 [details] [review] gvariant: Clarify that nullable strings should use maybe types Otherwise people might try to encode a NULL string as "NULL". I’m not even kidding.
Created attachment 298472 [details] [review] gerror: Minor clarifications to the GError documentation • Clarify that GError** parameters are for the return of _newly allocated_ GError*s. • Clarify that errors may need to be checked for explicitly if the return value of a function doesn’t reliably indicate them.
(In reply to Ryan Lortie (desrt) from comment #28) > Review of attachment 293081 [details] [review] [review]: > > This isn't really true, by the usual definition of what people think when > they see "threadsafe". > > When you consider how all of the "threadsafe" API variants in POSIX work, > they all effectively add a pointer to a buffer or context object in order to > avoid sharing global state. That looks quite a lot like how almost > everything in GLib works. > > The functions that we do have that work on global state _do_ tend to be > threadsafe. > > I think we need a better explanation that covers this idea in more detail. I’ve dropped this patch, since as you say, it doesn’t really work.
Review of attachment 298468 [details] [review]: ::: glib/gmain.c @@ +199,3 @@ + * remove that source from the main context using g_source_remove() when the + * owning object is finalised. This ensures that the callback can only be + * invoked while the object is still alive. I don't think it is correct to say that this is 'preferred'. It may be preferred over what you describe as alternative, but it is just as good to call g_signal_handlers_disconnect_by_func() or one of the other variants.
Review of attachment 298468 [details] [review]: ::: glib/gmain.c @@ +199,3 @@ + * remove that source from the main context using g_source_remove() when the + * owning object is finalised. This ensures that the callback can only be + * invoked while the object is still alive. Of course, this was nonsense, disregard.
Review of attachment 298472 [details] [review]: This looks good to me.
Comment on attachment 298472 [details] [review] gerror: Minor clarifications to the GError documentation Attachment 298472 [details] pushed as bdfc823 - gerror: Minor clarifications to the GError documentation
Created attachment 298523 [details] [review] gstrfuncs: Add a string formatting note about using G_GUINT64_FORMAT …and friends. The ‘String precision pitfalls’ section is already linked to from all the relevant printf()-style functions, so this documentation should hopefully be easy to find.
Ping? It would be great to get the rest of these reviewed and committed. :-)
Review of attachment 298470 [details] [review]: Is it common ? Do you have evidence for that ?
(In reply to Matthias Clasen from comment #47) > Review of attachment 298470 [details] [review] [review]: > > Is it common ? Do you have evidence for that ? It seems to be a common misconception from the code reviews I've been doing for Collabora clients. That code is not yet released, but will be soon (this year, hopefully), so I can show you the evidence then. Otherwise, I'm afraid I can't.
Review of attachment 298470 [details] [review]: ::: glib/gthread.c @@ +152,3 @@ + * + * However, if multiple blocking operations need to be performed in sequence, + * thread, eliminating the need for locking and synchronisation between multiple I would mention GTask and g_task_run_in_thread(), here.
Created attachment 301568 [details] [review] gthread: Suggest using *_async() functions instead of threads It’s unfortunately common to see worker threads being spawned all over the place to do operations which could be brought into the main thread with an async call, simplifying everything.
Review of attachment 298469 [details] [review]: ::: gobject/gsignal.c @@ +101,3 @@ + * ## Memory management of signal handlers # {#signal-memory-management} + * + * It is typically a program bug to call g_signal_connect() without a paired Ehh, I wouldn't say that. If we go with "typically" then the most typical use of signals are widgets — and their memory management allows you to avoid disconnecting signals, and rely on the normal destruction sequence to get rid of the need to explicitly disconnect signal handlers. It also does not really matter when you're not connecting objects, or if the signal handlers connected are meant to be used for the entire lifetime of the application — which I'd argue is far more typical than connecting handlers to signals on transient object instances. I'd probably reword this sentence as: """ If you are connecting handlers to signals and using an object instance as your signal handler user data, you should remember to pair calls to g_signal_connect() to calls to g_signal_handler_disconnect() or g_signal_handlers_disconnect_by_func(). While signal handlers are automatically… """
Review of attachment 298471 [details] [review]: Looks good.
Review of attachment 298523 [details] [review]: Yep, good point.
Review of attachment 301568 [details] [review]: Looks good. One minor comment that does not block pushing. ::: glib/gthread.c @@ +153,3 @@ + * + * However, if multiple blocking operations need to be performed in sequence, + * thread, eliminating the need for locking and synchronisation between multiple I would actually point to GTask in GIO, here.
Thanks for the reviews; I’ve tweaked attachment #301568 [details] to push GTask a little more strongly in the sequence-of-tasks-in-a-worker-thread case. Attachment 298471 [details] pushed as 8c858a0 - gvariant: Clarify that nullable strings should use maybe types Attachment 298523 [details] pushed as 5ee333e - gstrfuncs: Add a string formatting note about using G_GUINT64_FORMAT Attachment 301568 [details] pushed as d624bf4 - gthread: Suggest using *_async() functions instead of threads
Created attachment 309543 [details] [review] gmain: Document memory management best practices for GSources It’s very common to see code where a timeout is scheduled using g_timeout_add(), yet the owning object could be destroyed shortly afterwards, before the timeout is fired, leading to use-after-free. Try and prevent this happening with new code by documenting best practices for memory management of user data for GSource callbacks.
Created attachment 309544 [details] [review] gsignal: Document memory management best practices for signal handlers It’s quite common to see naked g_signal_connect() calls without a paired g_signal_handler_disconnect(). This is commonly a bug which could lead to uses of the callback user data after it’s been freed. Document the best practices for avoiding this kind of bug by properly disconnecting all signal handlers.
Created attachment 309545 [details] [review] gmessages: Mention g_return_if_fail() in g_warning() and g_error() docs It seems to be common for people to use g_warning() or g_error() as pre- and post-condition error reporting functions, which is not really what they’re intended for. Similarly, it is generally a sign of bad API design to use g_warning() to report errors — use GError instead. Try and suggest this to the user in the hope that nice code results.
Review of attachment 298468 [details] [review]: Looks good to me.
Review of attachment 309543 [details] [review]: Looks good.
Review of attachment 309544 [details] [review]: Okay.
Review of attachment 309545 [details] [review]: Okay.
Fantastic, thanks for the reviews! Attachment 309543 [details] pushed as db8455f - gmain: Document memory management best practices for GSources Attachment 309544 [details] pushed as ef1ba45 - gsignal: Document memory management best practices for signal handlers Attachment 309545 [details] pushed as 5a64265 - gmessages: Mention g_return_if_fail() in g_warning() and g_error() docs