After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 682345 - Add a static string type
Add a static string type
Status: RESOLVED OBSOLETE
Product: glib
Classification: Platform
Component: general
2.33.x
Other Linux
: Normal enhancement
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks:
 
 
Reported: 2012-08-21 11:13 UTC by iain
Modified: 2018-05-24 14:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Add a GStaticString type (34.93 KB, patch)
2012-08-21 11:13 UTC, iain
none Details | Review
Add a GStaticString type (35.24 KB, patch)
2012-08-21 11:25 UTC, iain
none Details | Review

Description iain 2012-08-21 11:13:13 UTC
Created attachment 221980 [details] [review]
Add a GStaticString type

Attached is a patch that adds an immutable reference counted string type to Glib, similar to QString and NSString. The API is 

G_BEGIN_DECLS

typedef struct _GStaticString GStaticString;

GStaticString *g_static_string_new                  (const char *str);
GStaticString *g_static_string_new_with_buffer      (char  *buffer,
                                                     gssize length);
GStaticString *g_static_string_new_from_format      (const char *format,
                                                     ...);
GStaticString *g_static_string_ref                  (GStaticString *sstr);
void           g_static_string_unref                (GStaticString *sstr);
const char *   g_static_string_get_cstring          (GStaticString *sstr);
gsize          g_static_string_get_length           (GStaticString *sstr);
GStaticString *g_static_string_concat               (GStaticString *sstr_a,
                                                     GStaticString *sstr_b);
GStaticString *g_static_string_append_cstring       (GStaticString *sstr,
                                                     const char    *str);
GStaticString *g_static_string_append_format        (GStaticString *sstr,
                                                     const char    *format,
                                                     ...);
GStaticString *g_static_string_prepend_cstring      (GStaticString *sstr,
                                                     const char    *str);
GStaticString *g_static_string_prepend_format       (GStaticString *sstr,
                                                     const char    *format,
                                                     ...);
GStaticString *g_static_string_canon                (GStaticString *sstr,
                                                     const char    *valid_chars,
                                                     char           substitutor);
GStaticString *g_static_string_delimit              (GStaticString *sstr,
                                                     const char    *delimiters,
                                                     char           new_delimiter);
GStaticString *g_static_string_reverse              (GStaticString *sstr);
gssize         g_static_string_find_string          (GStaticString *haystack_sstr,
                                                     GStaticString *needle_sstr);
gssize         g_static_string_find_cstring         (GStaticString *haystack_sstr,
                                                     const char    *needle_str);
gssize         g_static_string_reverse_find_string  (GStaticString *haystack_sstr,
                                                     GStaticString *needle_sstr);
gssize         g_static_string_reverse_find_cstring (GStaticString *haystack_sstr,
                                                     const char    *needle_str);
GStaticString *g_static_string_copy_range           (GStaticString *sstr,
                                                     gsize          start,
                                                     gsize          length);
gboolean       g_static_string_is_equal_cstring     (GStaticString *sstr,
                                                     const char    *str);
int            g_static_string_compare              (GStaticString *sstr_a,
                                                     GStaticString *sstr_b);
int            g_static_string_compare_cstring      (GStaticString *sstr_a,
                                                     const char    *str_b);
int            g_static_string_casecompare          (GStaticString *sstr_a,
                                                     GStaticString *sstr_b);
int            g_static_string_casecompare_cstring  (GStaticString *sstr_a,
                                                     const char    *str_b);
G_END_DECLS

The point is to reduce the number of short lived copies of strings, for example

char *name;
g_object_get (object, "name", &name, NULL);
g_print ("The object name is %s", name);
g_free (name);

is a fairly common (if simplified) pattern. Changing that to

GStaticString *name;
g_object_get (object, "name", &name, NULL);
g_print ("The object name is %s", g_static_string_get_cstring (name));
g_static_string_unref (name);

would just pass around the same GStaticString that is held in object saving the time needed to allocate and copy the string.

GStaticStrings are also cached, so for the case
a = g_static_string_new ("test");
b = g_static_string_new ("test");

a == b, reducing string comparisons to a simple address comparison.

Open issues:
* It'd be nice if gtk used GStaticString everywhere, but that's a large change for the future. It would still be good to have something like this in glib so people could get some of the benefits of it in their applications.

* NSString has compiler support to make creating NSStrings easy: @"hello world". Something like that to make creating GStaticString would be nice too: G_("hello world") for example. However finding a short macro name that doesn't clash will be tricky. Same for _get_cstring ();

* I'm not sure of g_static_string_new_with_buffer (buffer, length); The GStaticString takes ownership of the buffer, but due to caching it may not need to so it frees the buffer, meaning that after calling _new_with_buffer @buffer may, or may not, be valid memory anymore, the calling function has no way of knowing. It's a very useful function internally and I can see a lot of use for it to external applications, but these semantics are strange. Is it ok to document them or should we just not export the function at all?

* Should it require UTF-8 only strings? What other features would be needed for UTF-8 support?
Comment 1 iain 2012-08-21 11:14:16 UTC
Oh yes,

* What other functions might be needed on the type?
Comment 2 iain 2012-08-21 11:18:36 UTC
Just realised:

* It's not thread safe yet.
Comment 3 iain 2012-08-21 11:25:44 UTC
Created attachment 221982 [details] [review]
Add a GStaticString type

Now with added thread safe locking around cache accesses
Comment 4 Christian Persch 2012-08-21 12:29:42 UTC
Bug 622721 has discussion on this.
Comment 5 iain 2012-08-21 12:50:43 UTC
Some additional numbers:
For 1,000,000 iterations:

Short strings (always "helloworld!")
g_strdup:  0.055898s
g_static_string_new: 0.061688s

Long strings (always ~200byte string)
g_strdup:  0.119591s
g_static_string_new: 0.378108s

In these numbers the slowness of g_static_string_new is the cache lookup as it's always the same string so the GStaticString returned is always the same. 


Short random strings (pregenerated strings of random chars up to 8bytes long)
g_strdup:  0.043155s
g_static_string_new: 0.801872s

Long random strings (pregenerated strings of random chars up to 256bytes long)
g_strdup:  0.113764s
g_static_string_new: 0.679277s

Up until now the cache has always been cleared between tests.

Long random strings without empty cache
g_static_string_new: 2.186603s

For this test the cache was prefilled with 1,000,000 short strings, and then the long string test was run, so there was a 0% hit rate on the cache. Obviously we're hitting issues with GHashTable having so many keys (2,000,000 by the end of the test) and having to do a g_str_equal on a lot of them.
Comment 6 iain 2012-08-21 12:52:15 UTC
Ultimately though, this change is less about speed, and more about lowering memory usage and fragmentation by sharing strings
Comment 7 iain 2012-08-21 13:03:03 UTC
(In reply to comment #4)
> Bug 622721 has discussion on this.

Oddly started by code I wrote 2 years ago, but have no memory of ever doing so.
Comment 8 Colin Walters 2012-08-21 14:30:53 UTC
It'd be significantly more efficient to reuse ELF for readonly data like this.

The tradeoff is that you need to either:

1) Scan all of the source code with an external program (we already have too many of these...)
2) Manually maintain an integer->string lookup table

See http://www.akkadia.org/drepper/dsohowto.pdf Section 2.4.1
Comment 9 iain 2012-08-21 17:08:56 UTC
It's not for data that's fixed at compile time. Maybe static is the wrong word?
Comment 10 Colin Walters 2012-08-21 18:16:23 UTC
Why don't we just add g_bytes_new_from_static_string() ?
Comment 11 iain 2012-08-21 21:26:15 UTC
Well, this gives you more than GBytes does:
- a cache so memory is shared which gives the ability to compare strings as just a pointer comparison
- other functions to do manipulation on the strings, comparisons, string searches and the like.

and while it's a semantic thing, an indication that it's something to use for strings.
a hypothetical gtk_label_new_with_static_string (GStaticString *string) seems logical
gtk_label_new_with_bytes (GBytes *bytes) doesn't seem just so proper.

Given the number of other people who've written something like this (there's now 5 different implementations of a GStaticString type on bugzilla, although 2 of them are written by me), it seems like it's something that is useful to have.
Comment 12 iain 2012-08-21 21:27:19 UTC
Well, this gives you more than GBytes does:
- a cache so memory is shared which gives the ability to compare strings as just a pointer comparison
- other functions to do manipulation on the strings, comparisons, string searches and the like.

and while it's a semantic thing, an indication that it's something to use for strings.
a hypothetical gtk_label_new_with_static_string (GStaticString *string) seems logical
gtk_label_new_with_bytes (GBytes *bytes) doesn't seem just so proper.

Given the number of other people who've written something like this (there's now 5 different implementations of a GStaticString type on bugzilla, although 2 of them are written by me), it seems like it's something that is useful to have.
Comment 13 GNOME Infrastructure Team 2018-05-24 14:29:56 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/597.