After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 612591 - [patch] Please review g_string_ref() and g_string_unref() API additions
[patch] Please review g_string_ref() and g_string_unref() API additions
Status: RESOLVED DUPLICATE of bug 622721
Product: glib
Classification: Platform
Component: general
unspecified
Other All
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2010-03-11 16:35 UTC by Richard Hughes
Modified: 2017-11-16 10:38 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Initial patch for review (3.44 KB, patch)
2010-03-11 16:35 UTC, Richard Hughes
none Details | Review
GRefString, an immutable thread-safe refcounted string type (3.13 KB, patch)
2010-04-23 19:05 UTC, Alexander Larsson
none Details | Review
new patch with comments addressed from review (2.92 KB, patch)
2010-04-26 16:40 UTC, Richard Hughes
none Details | Review

Description Richard Hughes 2010-03-11 16:35:05 UTC
Created attachment 155872 [details] [review]
Initial patch for review

In my various projects, I use refcounted string objects, mostly to save memory and also for to avoid over-zealous strdup'ing. Most projects won't need to do this, but when you have 10 copies of 10,000 strings (typically 100 bytes each) then the refcounting functionality becomes really useful.

In the attached patch I've added this functionality to GString, copying the logic from the recent GPtrArray _ref and _unref API additions.

Please review. Thanks!
Comment 1 Colin Walters 2010-04-23 11:45:08 UTC
Review of attachment 155872 [details] [review]:

Since GString is a mutable type, it seems like a bit of a trap to use it for structural string sharing.  It's also just not anywhere as efficient as a dedicated data type could be.  For example with a dedicated API you can just pack a refcount in front of the string data in one malloc block.

(No objections though to the patch - it may be useful for some other cases to have refcounts on strings).
Comment 2 Alexander Larsson 2010-04-23 13:00:06 UTC
I have such a packed refcounted string in eel:

http://git.gnome.org/browse/nautilus/tree/eel/eel-string.h
http://git.gnome.org/browse/nautilus/tree/eel/eel-string.c
(look for eel_ref_str)

It supports both "normal" refcounted strings and "unique strings" which are internalized in a hash table while there is a reference outstanding to it.

All threadsafe too.
Comment 3 Alexander Larsson 2010-04-23 13:00:59 UTC
eel_ref_str is a very simple and useful, so i wouldn't mind if it was in glib actually.
Comment 4 Colin Walters 2010-04-23 13:07:30 UTC
What about extending the quark API to be refcounted?
Comment 5 Alexander Larsson 2010-04-23 13:33:29 UTC
Thats not ABI compatible. quarks are expected to get the same value next time even if at some point inbetween there was no living string referencing it.
Comment 6 Alexander Larsson 2010-04-23 13:35:30 UTC
I use the unique ref_strs for things like mimetypes and usernames/realnames in nautilus files. These are massively duplicated, but are freeform strings so any static enum would not work. Also, this makes comparison of such strings fast (just compare pointer bits).
Comment 7 Colin Walters 2010-04-23 13:35:41 UTC
Well, you could make it so that if a GQuark was ever created for a given string, its refcount would go to 0xFFFF which would act as a flag meaning "infinite".
Comment 8 Richard Hughes 2010-04-23 13:42:11 UTC
(In reply to comment #3)
> eel_ref_str is a very simple and useful, so i wouldn't mind if it was in glib
> actually.

Yes, that would suit my application too. It looks like a few projects now code their own thing (mine included) and I'm sure they are not all as threadsafe as your code. +1 from me.
Comment 9 Matthias Clasen 2010-04-23 14:33:33 UTC
We don't need any refcounts on quarks. The are immortal.

If your purpose is sharing unique strings, just use g_intern_static_string or g_intern_string. Just make sure to not use the static version from an (un)loadable module.

I can still see that it might be useful to make our string buffer (ie GString) refcounted.
Comment 10 Matthias Clasen 2010-04-23 15:11:30 UTC
Review of attachment 155872 [details] [review]:

::: glib/gstring.c
@@ +101,3 @@
+  volatile gint ref_count;
+};
+

Why not just add the ref_count field to GString ?
It is not like we are supporting stack-allocated GStrings....
Comment 11 Alexander Larsson 2010-04-23 15:13:33 UTC
Matthias:

I don't use g_intern_string, because thes are not 'internal' strings, but
things based on user input, so the set of strings is unbounded and changes over
time. It would not be nice if these things were immortal. Instead my unique
strings are kept in the unique hash only while there is at least one reference
to the string.

Also, the refcounted strings are not only for interned strings. The
non-interned eel_ref_str:s in nautilus is used for e.g. the File names. Each
file object has a few different types of names (display name, file name, edit
name). These are all eel_ref_str:s, and typically (except in weird cases) all
point to the same ref-string (not unique in general, we just compare when
setting these names).
Comment 12 Matthias Clasen 2010-04-23 18:52:44 UTC
Alex: Ah, I see. That is a valid case.

> Well, you could make it so that if a GQuark was ever created for a given
> string, its refcount would go to 0xFFFF which would act as a flag meaning
> "infinite".

I think this might work out. We'd have to keep a separate hash table for refed strings that are not quarks. What would that api look like ?

const gchar *g_ref_string (const gchar *string)
const gchar *g_unref_string (const gchar *string)

The return values would be guaranteed to be == comparable with each other an with interned strings, but while interned strings live forever, these only live as long as there's a reference. 

Need to be a bit careful to get sequences like this right:

a = g_ref_string ("a");
b = g_intern_string ("a"); // probably waste the memory for the refcount here
if (a != b) g_print ("fail!\n");
g_unref_string (a);


Would be great if somebody could try to implement that.
Comment 13 Alexander Larsson 2010-04-23 19:05:53 UTC
Created attachment 159455 [details] [review]
GRefString, an immutable thread-safe refcounted string type

I did a quick port of the eel code to glib to show what i mean.
Comment 14 Alexander Larsson 2010-04-23 19:16:07 UTC
As per comment 12. That "fail" assert won't hit. I don't actually put non-interned strings in the hashtable, they are just normal refcounted objects. Space in the hashtable is only made for interned strings.
Comment 15 Matthias Clasen 2010-04-24 00:33:23 UTC
Oh, but I was talking about the exiting g_intern_string api that is quark-based.
What I was proposing is to arrange things so that g_ref_string ("a") returns the string from the quark string table if there is one, and that
g_intern_string would look for an existing copy in your table of ref'ed strings before duping the string that it got.
Comment 16 Alexander Larsson 2010-04-24 07:53:08 UTC
Hmm, GRefString is basically a const char *, but its an opaque type that you access via a macro (it just has a gint refcount stored at offset -4). It seems risky to have an api that takes "char *" types that are not "normal" c strings. Lots of chances for confusion and wrong kinds of strings ending up in this api.

However, sharing the quark table with the refcounted interned strings table seems like a good idea in general.
Comment 17 Richard Hughes 2010-04-25 21:51:41 UTC
(In reply to comment #10)
> Why not just add the ref_count field to GString ?
> It is not like we are supporting stack-allocated GStrings....

Otherwise ref_count becomes visible (and settable) rather than an internal implementation detail.

Richard.
Comment 18 Alexander Larsson 2010-04-26 13:57:37 UTC
Richard: Its not like we support the app randomly setting the other fields of GString.
Comment 19 Richard Hughes 2010-04-26 16:40:06 UTC
Created attachment 159623 [details] [review]
new patch with comments addressed from review

(In reply to comment #18)
> Richard: Its not like we support the app randomly setting the other fields of
> GString.

I guess, it just seemed odd to put a volatile int into a public header. New patch attached for review.
Comment 20 Emmanuele Bassi (:ebassi) 2010-06-25 14:51:28 UTC
https://bugzilla.gnome.org/show_bug.cgi?id=622721 contains a patch with a refcounted string that overloads C strings.
Comment 21 Richard Hughes 2016-11-14 21:33:58 UTC
It's 6 years later, and now gnome-software needs a refcounted string object. Given we have g_ptr_array_ref() can't we have g_string_ref() as well? Wrapping GString into a GsString means I have to cast everything in an ugly way, add this to the gnome-software plugin API and deal with all the gobject-introspection junk. Having g_string_ref/unref would really make me happy and make our core platform library more consistent. I'm happy to rebase if it helps.
Comment 22 Emmanuele Bassi (:ebassi) 2016-11-15 00:18:45 UTC
I'm sure we can add refcounting to GString without breaking ABI, by having a:

  typedef struct {
    GString base;

    volatile int ref_count;
  } GRealString;

inside gstring.c and then allocating a GRealString internally.

The issue is that GString is *not* a string: it's a string builder. Passing it around only makes sense if you're modifying it, not as a reference counted immutable string.

If gnome-software is using GString already because it's modifying string buffers then it's fine; and passing around GString instances, instead of playing ping pong with char* and GString, is liable to cut down memory use and fragmentation.
Comment 23 Richard Hughes 2016-11-15 09:55:38 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #22)
> I'm sure we can add refcounting to GString without breaking ABI, by having a:
> 
>   typedef struct {
>     GString base;
> 
>     volatile int ref_count;
>   } GRealString;
> 
> inside gstring.c and then allocating a GRealString internally.

So, I did that initially in https://bugzilla.gnome.org/attachment.cgi?id=155872&action=diff unless I'm mistaken. I just copied the public struct members to avoid using a 'base' indirection everywhere which seemed to be what other objects did in GLib.

> The issue is that GString is *not* a string: it's a string builder. Passing
> it around only makes sense if you're modifying it, not as a reference
> counted immutable string.

Right, and GPtrArray isn't a immutable array either, it's an array builder. Given we can refcount pretty much every other object in GLib (even explicit builders like GKeyFile) I don't think mutability of a refcounted object is a issue here.

> If gnome-software is using GString already because it's modifying string
> buffers then it's fine; and passing around GString instances, instead of
> playing ping pong with char* and GString, is liable to cut down memory use
> and fragmentation.

Using a refcounted GsString rather than strdup()ing 35 localised string properties in 1400 objects saves me ~2.5Mb per process and allows gnome-software to start 200ms faster (and likely a lot more on ARM).
Comment 24 Emmanuele Bassi (:ebassi) 2016-11-15 10:17:27 UTC
(In reply to Richard Hughes from comment #23)
> (In reply to Emmanuele Bassi (:ebassi) from comment #22)
> > I'm sure we can add refcounting to GString without breaking ABI, by having a:
> > 
> >   typedef struct {
> >     GString base;
> > 
> >     volatile int ref_count;
> >   } GRealString;
> > 
> > inside gstring.c and then allocating a GRealString internally.
> 
> So, I did that initially in
> https://bugzilla.gnome.org/attachment.cgi?id=155872&action=diff unless I'm
> mistaken. I just copied the public struct members to avoid using a 'base'
> indirection everywhere which seemed to be what other objects did in GLib.

Yep, that's okay.
 
> > The issue is that GString is *not* a string: it's a string builder. Passing
> > it around only makes sense if you're modifying it, not as a reference
> > counted immutable string.
> 
> Right, and GPtrArray isn't a immutable array either, it's an array builder.

Not really: it's a mutable array. :-)

> Given we can refcount pretty much every other object in GLib (even explicit
> builders like GKeyFile) I don't think mutability of a refcounted object is a
> issue here.

It's not. It's mostly that we don't really use GString inside our API because it's a "heavy" object. In general, our strings are immutable at the point of entry into an API.

Additionally, GString is kind of a C-only API.

> > If gnome-software is using GString already because it's modifying string
> > buffers then it's fine; and passing around GString instances, instead of
> > playing ping pong with char* and GString, is liable to cut down memory use
> > and fragmentation.
> 
> Using a refcounted GsString rather than strdup()ing 35 localised string
> properties in 1400 objects saves me ~2.5Mb per process and allows
> gnome-software to start 200ms faster (and likely a lot more on ARM).

How many of those strings are really mutable, though? I mean: are you taking those strings and modifying them, or are you just avoiding a copy and then using the string somewhere else, like you would with a `char*`?
Comment 25 Richard Hughes 2016-11-15 12:11:07 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #24)
> > Using a refcounted GsString rather than strdup()ing 35 localised string
> > properties in 1400 objects saves me ~2.5Mb per process and allows
> > gnome-software to start 200ms faster (and likely a lot more on ARM).
> 
> How many of those strings are really mutable, though? I mean: are you taking
> those strings and modifying them, or are you just avoiding a copy and then
> using the string somewhere else, like you would with a `char*`?

A mixture of both really. The majority of strings never get changed (e.g. the localized application name) but some do get modified or "fixed up" for display, for instance the licence field gets hyperlinked, any trailing spaces are removed. The fixing up isn't in the hot path so to speak, so a refcounted char* would certainly get me 99% the way there too.

Certainly g_ref_pointer_alloc0() looks like the more general API, although I don't know if it's too general to actually be obvious.

From my point of view there seem to be a few obvious ways to do a reference counted string:

inline blob at head, i.e. overallocating and putting the refcount in the first few bytes
 + less fragmentation
 + faster as only one alloc
 + fast to ref and unref
 - you can't "steal" the char memory
 - you can't just treat it like a general char* without a silly getter

inline blob at tail:
 + less fragmentation
 + faster
 + can treat like a normal char*
 - you can't just steal the char
 - need to do strlen just to get the refcount
 - bad things happen if the string length is ever changed, e.g. truncated

wrapped blob, e.g. creating a slice and internal malloc like GString
 + more fragmentation
 + slower
 - can't treat like normal char*
 + looks different in code to normal char*
 + fast to refcount
 + possible to add other things to hidden struct
 + possible to change internal string
 + possible to steal internal string

hashed blob, e.g. using a static hash table to keep track of memory refcounts
 + looks, walks and talks like a char*
 + can modify string as long as start address stays the same
 + fast-ish to refcount as just needs a hashtable lookup
 - difficult to clear/destroy hashtable
 - not going to be clear when a function is returning a ref'd "char*"
 - not going to be possible to do g_autoptr() to automatically unref the pointer refcount
 - kinda kludgy returning gpointers everywhere, although with g_ref_pointer_take() it's kinda okay

I guess listing out all the +ves and -ves kinda points to a more general solution like g_ref_pointer_*() although I don't know if this is something that you guys still want to pursue.
Comment 26 Emmanuele Bassi (:ebassi) 2016-11-15 14:10:07 UTC
(In reply to Richard Hughes from comment #25)

For added fun and games, I've updated the refcounted allocations patches; you can see them in the wip/ebassi/rc branch of GLib.

> > How many of those strings are really mutable, though? I mean: are you taking
> > those strings and modifying them, or are you just avoiding a copy and then
> > using the string somewhere else, like you would with a `char*`?
> 
> A mixture of both really. The majority of strings never get changed (e.g.
> the localized application name) but some do get modified or "fixed up" for
> display, for instance the licence field gets hyperlinked, any trailing
> spaces are removed. The fixing up isn't in the hot path so to speak, so a
> refcounted char* would certainly get me 99% the way there too.
> 
> Certainly g_ref_pointer_alloc0() looks like the more general API, although I
> don't know if it's too general to actually be obvious.
 
> From my point of view there seem to be a few obvious ways to do a reference
> counted string:

This is, by the way, how g_ref_* works in the branch above:
 
> inline blob at head, i.e. overallocating and putting the refcount in the
> first few bytes
>  + less fragmentation
>  + faster as only one alloc
>  + fast to ref and unref
>  - you can't "steal" the char memory
>  - you can't just treat it like a general char* without a silly getter

You don't need a getter; if you overallocate and return a pointer to the content then you can treat the allocation as any other char*. We do this all the time in GObject. :-)

Stealing refcounted data is always tricky, unless you have weak references to it. 

For instance:

    char *str = g_ref_string_steal (orig_str);

now what does 'orig_str' points at? The same pointer, with reference count intact? Or do we set `orig_str[0] = '\0'` and assume that every string function will always stop at the first NUL byte? Or do we forcibly drop the reference to 0, and orig_str will now point at invalid memory?

> I guess listing out all the +ves and -ves kinda points to a more general
> solution like g_ref_pointer_*() although I don't know if this is something
> that you guys still want to pursue.

I'd still want it to be available. GSK has started adding a bunch of small data refcounted structures, and having to implement reference counting all the time is error prone.
Comment 27 Richard Hughes 2016-11-15 19:11:23 UTC
(In reply to Emmanuele Bassi (:ebassi) from comment #26)
> You don't need a getter; if you overallocate and return a pointer to the
> content then you can treat the allocation as any other char*. We do this all
> the time in GObject. :-)

Right, makes sense. I guess another negative is you don't know if a random pointer is refcounted or not without documenting it somewhere in an API. I don't know of a way of saying "for this random pointer in an allocated block, give me the start address of this block that we got from malloc" but I'm happy to be proved wrong.

> I'd still want it to be available. GSK has started adding a bunch of small
> data refcounted structures, and having to implement reference counting all
> the time is error prone.

Right, if you want to dupe this bug with #622721 that's fine with me, but I think we need to do /something/ this year. :)
Comment 28 Philip Withnall 2017-11-16 10:38:35 UTC
Let’s dupe this to bug #622721 then. I’ll add it to the 2.56 milestone with the goal of doing *something* this year.

*** This bug has been marked as a duplicate of bug 622721 ***