GNOME Bugzilla – Bug 575074
Add a Priority Queue Data Type
Last modified: 2017-10-11 09:19:21 UTC
I believe GLib should include a Priority Queue data type. A priority queue implements a subset of the "sorted set" abstract data type, implemented e.g. by GSequence. If only the subset of operations supported by a priority queue are to be used, implementations much more efficient than GSequence's balanced binary trees become possible. This can significantly improve the performance of selected algorithms.
Created attachment 130511 [details] [review] Proposed API and initial implementation (with docs) Patch against glib-2.19.10. Requires ./autogen.sh before rebuild.
(In reply to comment #1) > Created an attachment (id=130511) [edit] > Proposed API and initial implementation (with docs) > > Patch against glib-2.19.10. Requires ./autogen.sh before rebuild. though I'm not a glib maintainer, and I don't have any decisional power whether this should go in or not, here's a small review: +<!-- ##### SECTION Title ##### --> +Priority Queues + +<!-- ##### SECTION Short_Description ##### --> +a collection of data entries with associated priority values that returns +entries one by one in order of priority don't use a template file. you should embed the description inside the source code; see the code in GIO on how to do that. +struct _GPQueue { + struct _GPQueue *next; + struct _GPQueue *prev; + struct _GPQueue *parent; + struct _GPQueue *child; wrong code style. 2 spaces indentation levels. also GPQueue has already been defined, so you don't need to use "struct _GPQueue". really, the whole code style is a mess. I've singled out the wrong bits, but you should really open another source file and see how it's done. +static inline void +g_pqueue_insert_before (GPQueue *dest, GPQueue *src) wrong code style. the arguments list should be aligned: static inline void g_pqueue_insert_before (GPQueue *dest, GPQueue *src) { + * Inserts a new entry into a #GPQueue. + * + * Return value: The altered priority queue. + **/ missing "Since: 2.x" tag, to be changed when it lands. + if (handle != NULL) *handle = e; wrong code style: indentation. + if (pqueue != NULL) { + g_pqueue_insert_before(pqueue, e); + if (e->priority < pqueue->priority) + return e; + else + return pqueue; + } else { + return e; + } wrong code style: indentation. + return (pqueue != NULL) ? (pqueue->data) : (NULL); drop the parenthesis around the values. + g_pqueue_remove(b); + if (a->child != NULL) { + g_pqueue_insert_before(a->child, b); wrong code style: indentation and a space between function name and arguments. + GPQueue sentinel; + sentinel.next = &sentinel; + sentinel.prev = &sentinel; + g_pqueue_insert_before(pqueue, &sentinel); + + GPQueue *current = pqueue; no C99 the test should use the GTest API and be part of the test suite.
My main comment on the patch is that I think anything like this should be implemented more like GHashTable and GQueue, etc, not GList. That is, one creates a handle to the queue, and adds and removes from it. The queue handle will never change. Regarding the API naming, again, look at GQueue. use the _peek naming instead of _top for example.
(In reply to comment #2) > don't use a template file. you should embed the description inside the source > code; see the code in GIO on how to do that. Everything else in ./glib/ seems to use a template file. I don't have a personal preference, but are you sure it should be done like ./gio/ and not like ./glib/? Is this a rule for new code only? > wrong code style. 2 spaces indentation levels. That's trivial to change. I went with what the GNOME Programming Guidelines say without looking too closely at the existing code. I'll bring it closer to the other files. > missing "Since: 2.x" tag, to be changed when it lands. Should I really tag every single function and type separately? Including a note that the entire structure was added in 2.x seems more reasonable. > + GPQueue *current = pqueue; > > no C99 Declare everything at the beginning of a function? Too strict for my taste, but if that's the GLib way, okay. > the test should use the GTest API and be part of the test suite. Again, the other code in ./glib/ seems to do it like I did, with a simple test tool in ./tests/ that runs on make check. Another rule for new code only? Is the test suite well documented and easy to add to?
(In reply to comment #3) > My main comment on the patch is that I think anything like this should be > implemented more like GHashTable and GQueue, etc, not GList. That is, one > creates a handle to the queue, and adds and removes from it. The queue handle > will never change. I tend to agree. I guess I went with a changeable pointer to the root element because, well, that's what the actual implementation uses, but since I already agreed that the API should be as implementation-agnostic as possible… So I'd move GPQueue to GPQueueNode, and GPQueue becomes a wrapping structure that includes the root pointer. If I'm going to use a wrapping structure anyway, it would be possible to add more to that, for example an associated hash table that can track the validity of GPQueueHandles and return an error for invalid handles instead of undefined behavior. Obviously, this would imply a small performance hit, so maybe it should be optional or not done at all? > Regarding the API naming, again, look at GQueue. use the _peek naming instead > of _top for example. Sounds reasonable.
(In reply to comment #4) > (In reply to comment #2) > > > don't use a template file. you should embed the description inside the source > > code; see the code in GIO on how to do that. > > Everything else in ./glib/ seems to use a template file. I don't have a > personal preference, but are you sure it should be done like ./gio/ and not > like ./glib/? Is this a rule for new code only? we're trying to move away from template files: http://live.gnome.org/GTK%2B/TaskAPIDocMigration gio and gobject are already there; old files should be migrated and new files should start with inline documentation. > > missing "Since: 2.x" tag, to be changed when it lands. > > Should I really tag every single function and type separately? Including a note > that the entire structure was added in 2.x seems more reasonable. it's not a matter of what seems more reasonable to you. the Since tag is used to generate the developer documentation index for the release. so yes: it's really needed. > > + GPQueue *current = pqueue; > > > > no C99 > > Declare everything at the beginning of a function? Too strict for my taste, but > if that's the GLib way, okay. GLib supports multiple platforms and compilers, and not every compiler supports C99. sad, but a fact of life. > > the test should use the GTest API and be part of the test suite. > > Again, the other code in ./glib/ seems to do it like I did, with a simple test > tool in ./tests/ that runs on make check. those are the old tests. look in /glib/glib/tests, /glib/gobject/tests and /glib/gio/tests. even the tests in /glib/tests use the GTest API. > Another rule for new code only? Is > the test suite well documented and easy to add to? yes, the GTest API is documented and there are examples.
Two comments: - Doing prioqs with elements that have pointer references to their siblings/father/children is messy. The classic way to do prioqs is by based on an array. When you have an entry at a certain index i then you find its father at i/2 and its children at i*2 and i*2+1, and so on. That makes a lot of things much easier and the element structs much smaller. (It might even make sense to no add a special prioq type but reuse GArray plus some additional methods. Similar to how Python's prioq support does it.) - I think it is highly advisable to allow arbitrary priority types. i.e. not just a 'gint priority' but any user defined order defined by a callback. One of the most prominent uses cases for prioqs are sorting queued time events -- which are usually ordered by 'struct timeval's, not gints. I don't think it is the prioq code's job to store the priority value at all, that's something the app should do inside of their elements. Disclaimer: I have implemented a couple of prioq's, and I think my implementations are quite awesome... ;-) It might be an idea to port this implementation from PulseAudio to glib: http://git.0pointer.de/?p=pulseaudio.git;a=blob;f=src/pulsecore/prioq.c http://git.0pointer.de/?p=pulseaudio.git;a=blob;f=src/pulsecore/prioq.h
(In reply to comment #7) > Two comments: > > - Doing prioqs with elements that have pointer references to their > siblings/father/children is messy. The classic way to do prioqs is by based on > an array. When you have an entry at a certain index i then you find its father > at i/2 and its children at i*2 and i*2+1, and so on. Please try to understand what my implementation actually does. The classic array representation restricts you to using one binary tree as the underlying data structure <http://en.wikipedia.org/wiki/Binary_heap>. My implementation uses, as I mentioned before, a Fibonacci Heap <http://en.wikipedia.org/wiki/Fibonacci_heap>. This structure has better performance guarantees, but it is not a binary tree, it is a forest of trees which don't have to be binary. I do not believe that this structure can be implemented without having several relationship pointers in the nodes. If we can agree on that, then the only advantage of wrapping everything in an array of structs would be more efficient memory allocation than malloc()ing every element. However, since I am using the Slice Allocator and not malloc() or its derivatives, this should be unnecessary. > - I think it is highly advisable to allow arbitrary priority types. i.e. not > just a 'gint priority' but any user defined order defined by a callback. I have considered this. For correctness, my heap (and even simpler Binary heaps) heavily relies on the assumption that the relationship between elements doesn't change, except when using the change_priority() method. The proposed API obviously enforces this, assuming the user doesn't write directly into opaque structures. A user-provided comparison function would have to strictly follow that rule, but it couldn't be forced to. If the user fiddles with his objects and relationships change without informing the queue of the change, the return order will become undefined. On the one hand, if we simply put a warning in the documentation about this, I think users will still trip over this. On the other hand, some of the other sorted structures in GLib probably require the same assumption, and don't even mention it in the documentation. Any more opinions on this? I can easily make the change, but I am somewhat skeptical of this approach. As a general remark, I have been super-busy lately with other things, but I am still interested in this and am working on a second revision of the patch. Same underlying structure, adjusted API taking several of the comments above into account. I'll submit it soon-ish. When it's done.
Created attachment 133513 [details] [review] Revised Patch Finally, a new revision of the proposed patch, that hopefully addresses most of the comments I received. The basic idea remains the same, but here is an overview of things I've changed: - Documentation embedded and tagged (Since:, Returns:) - No C99 (I think) - Rewritten selftest, now in glib/tests and hopefully using the right API - Simpler API, which - Uses a comparison function, not integer priorities - Is mostly implementation-agnostic One thing I'm not sure about is g_pqueue_priority_decreased(). On the one hand, there is a significant runtime difference (O(1) vs. O(log n)). On the other hand, the split invites programming errors and exposes an implementation detail. I think the latter can be overlooked for two reasons, though: Firstly, it's true for most ways to implement a priority queue that DecreaseKey is faster than the more generic ChangeKey. Secondly, for every possible implementation, simply making g_pqueue_priority_decreased() an alias for g_pqueue_priority_changed() will be semantically valid, so having to provide the function would never strictly prevent one from using a specific underlying implementation.
(In reply to comment #8) > (In reply to comment #7) > > Two comments: > > > > - Doing prioqs with elements that have pointer references to their > > siblings/father/children is messy. The classic way to do prioqs is by based on > > an array. When you have an entry at a certain index i then you find its father > > at i/2 and its children at i*2 and i*2+1, and so on. > > Please try to understand what my implementation actually does. The classic > array representation restricts you to using one binary tree as the underlying > data structure <http://en.wikipedia.org/wiki/Binary_heap>. My implementation > uses, as I mentioned before, a Fibonacci Heap > <http://en.wikipedia.org/wiki/Fibonacci_heap>. Hmm. The question though is whether a the advantage of having a few operations going from logarithmic to constant is advantageous in the light that this comes at the price of higher memory usage. On 64bit five pointers per item are quite expensive memory (and bandwith)-wise. Two pointers are less so. Given that cache pressure is usually the defining characteristic for microoptimization these days it might be worth to reduce mnemory consumption per element a bit at the price that some operations go from O(1) to O(log(n)). Dunno. Benchmarks might be be useful to find that out.
Hi Everyone, I am currently using Glib to develop an application that could heavily benefit from priority queues. I am looking forward to seeing a priority queue in future versions of Glib. Maik Zumstrull I want to say thank you for your contribution and your time. Hopefully this will result in added functionality which everyone can benefit from. Any chance of an asynchronous priority queue for communication between threads? Cheers! Eric des Courtis
Thank-you for your hard work on this, Maik! It seems like quite some time has elapsed since the last look at this, though it would certainly be cool to see it included. What are some possible next steps for this?
(In reply to comment #12) > What are some possible next steps for this? I think the step I've been hoping for from the beginning would be good: an actual GLib Maintainer stepping up and saying either "we're not interested in including this in any form ever, go away" or "we're basically willing to include an addressable priority queue, and here's the specific problems with the design and implementation you need to fix before we might include yours". Personally, I'm not entirely happy with the API I came up with (either version), but I don't see any way of making it more accessible without adding one or two internal GHashTables to keep track of certain associations, which I'd like to avoid for performance reasons. Possible counterarguments are "a streamlined API is more important than a few percentage points worth of performance" and "if you don't include these hashtables, the user will most likely make them himself to keep track of things, in which case we might as well do it for him". A comment on this would be appreciated, too.
Hi Mark, Just checked out the new patch I found here, and can say it seems a lot nicer to use. Just a quick heads-up though - I don't know if it doing it for the sake of internal efficiency, but the compare callback function being called is sometimes called with the a and b parameters pointing to the same object. Is that expected?
(In reply to comment #14) > Just a quick heads-up though - I don't know if it doing it for the sake of > internal efficiency, but the compare callback function being called is > sometimes called with the a and b parameters pointing to the same object. Is > that expected? It's not done intentionally, i.e. there's no explicit cmp(foo, foo) done anywhere, but it could happen. For example, in g_pqueue_push, the new element is compared to the current head of the queue. Those are different nodes in the internal tree structure, but they can point to the same external object, in which case it gets compared to itself. I could put something in static inline gint cmp like "if (a->data == b->data) return 0;", of course, avoiding that external call. I'm not sure if I should, though. I think the best thing would be to do what other GLib data structures do. If they avoid calling comparison callbacks on identical objects, I should, too. I'll check on that.
> I could put something in static inline gint cmp like "if (a->data == b->data) > return 0;", of course, avoiding that external call. I'm not sure if I should, > though. I think the best thing would be to do what other GLib data structures > do. If they avoid calling comparison callbacks on identical objects, I should, > too. I'll check on that. I've looked at sequences, balanced binary trees and doubly-linked lists in GLib. None of them explicitly avoid calling the external comparison function for identical objects, so I think that doing so isn't GLib coding style. If it's a real problem for your code to be asked to compare an object to itself, I believe you should make an adjustment in your code.
No problem, the comparison function can just test a == b first, for a quick return if that is the case.
Created attachment 147288 [details] [review] Revised Patch I've added an implementation that uses a binary heap instead of a Fibonacci heap. They use the same API and source file, two #defines at the top of the file switch between the implementations. The BH implementation is faster than the FH implementation unless the queue is used in a way heavily biased towards changing the priorities of entries (eg graph searching algorithms). It also uses less memory. I chose to use per-entry objects (slice allocated) instead of a flat array of structs to keep the queue fully addressable; there might be a better way to do this. I generated the patch with git this time instead of against a tarball.
See also bug 107082
Created attachment 161932 [details] Simple addressable priority queue To add yet another option, this is what I'm currently using. Seems to work well enough and the API is fairly simple (you can change an entry's key by simply pushing it again). I didn't bother fixing this up for inclusion, since that's obviously not happening anyway, so no documentation at this point. Just in case someone finds it useful.
Just a note to say thanks for your work on this! I am using the simple pqueue in a project with good success.
We’ve had two bugs open about priority queues for a long time. I’m going to close this one as a duplicate of bug #107082, but on the understanding that the patch here might be just as useful as the one there. If any progress is made on bug #107082 in future, this bug will still be around for reference. *** This bug has been marked as a duplicate of bug 107082 ***