GNOME Bugzilla – Bug 682345
Add a static string type
Last modified: 2018-05-24 14:29:56 UTC
Created attachment 221980 [details] [review] Add a GStaticString type Attached is a patch that adds an immutable reference counted string type to Glib, similar to QString and NSString. The API is G_BEGIN_DECLS typedef struct _GStaticString GStaticString; GStaticString *g_static_string_new (const char *str); GStaticString *g_static_string_new_with_buffer (char *buffer, gssize length); GStaticString *g_static_string_new_from_format (const char *format, ...); GStaticString *g_static_string_ref (GStaticString *sstr); void g_static_string_unref (GStaticString *sstr); const char * g_static_string_get_cstring (GStaticString *sstr); gsize g_static_string_get_length (GStaticString *sstr); GStaticString *g_static_string_concat (GStaticString *sstr_a, GStaticString *sstr_b); GStaticString *g_static_string_append_cstring (GStaticString *sstr, const char *str); GStaticString *g_static_string_append_format (GStaticString *sstr, const char *format, ...); GStaticString *g_static_string_prepend_cstring (GStaticString *sstr, const char *str); GStaticString *g_static_string_prepend_format (GStaticString *sstr, const char *format, ...); GStaticString *g_static_string_canon (GStaticString *sstr, const char *valid_chars, char substitutor); GStaticString *g_static_string_delimit (GStaticString *sstr, const char *delimiters, char new_delimiter); GStaticString *g_static_string_reverse (GStaticString *sstr); gssize g_static_string_find_string (GStaticString *haystack_sstr, GStaticString *needle_sstr); gssize g_static_string_find_cstring (GStaticString *haystack_sstr, const char *needle_str); gssize g_static_string_reverse_find_string (GStaticString *haystack_sstr, GStaticString *needle_sstr); gssize g_static_string_reverse_find_cstring (GStaticString *haystack_sstr, const char *needle_str); GStaticString *g_static_string_copy_range (GStaticString *sstr, gsize start, gsize length); gboolean g_static_string_is_equal_cstring (GStaticString *sstr, const char *str); int g_static_string_compare (GStaticString *sstr_a, GStaticString *sstr_b); int g_static_string_compare_cstring (GStaticString *sstr_a, const char *str_b); int g_static_string_casecompare (GStaticString *sstr_a, GStaticString *sstr_b); int g_static_string_casecompare_cstring (GStaticString *sstr_a, const char *str_b); G_END_DECLS The point is to reduce the number of short lived copies of strings, for example char *name; g_object_get (object, "name", &name, NULL); g_print ("The object name is %s", name); g_free (name); is a fairly common (if simplified) pattern. Changing that to GStaticString *name; g_object_get (object, "name", &name, NULL); g_print ("The object name is %s", g_static_string_get_cstring (name)); g_static_string_unref (name); would just pass around the same GStaticString that is held in object saving the time needed to allocate and copy the string. GStaticStrings are also cached, so for the case a = g_static_string_new ("test"); b = g_static_string_new ("test"); a == b, reducing string comparisons to a simple address comparison. Open issues: * It'd be nice if gtk used GStaticString everywhere, but that's a large change for the future. It would still be good to have something like this in glib so people could get some of the benefits of it in their applications. * NSString has compiler support to make creating NSStrings easy: @"hello world". Something like that to make creating GStaticString would be nice too: G_("hello world") for example. However finding a short macro name that doesn't clash will be tricky. Same for _get_cstring (); * I'm not sure of g_static_string_new_with_buffer (buffer, length); The GStaticString takes ownership of the buffer, but due to caching it may not need to so it frees the buffer, meaning that after calling _new_with_buffer @buffer may, or may not, be valid memory anymore, the calling function has no way of knowing. It's a very useful function internally and I can see a lot of use for it to external applications, but these semantics are strange. Is it ok to document them or should we just not export the function at all? * Should it require UTF-8 only strings? What other features would be needed for UTF-8 support?
Oh yes, * What other functions might be needed on the type?
Just realised: * It's not thread safe yet.
Created attachment 221982 [details] [review] Add a GStaticString type Now with added thread safe locking around cache accesses
Bug 622721 has discussion on this.
Some additional numbers: For 1,000,000 iterations: Short strings (always "helloworld!") g_strdup: 0.055898s g_static_string_new: 0.061688s Long strings (always ~200byte string) g_strdup: 0.119591s g_static_string_new: 0.378108s In these numbers the slowness of g_static_string_new is the cache lookup as it's always the same string so the GStaticString returned is always the same. Short random strings (pregenerated strings of random chars up to 8bytes long) g_strdup: 0.043155s g_static_string_new: 0.801872s Long random strings (pregenerated strings of random chars up to 256bytes long) g_strdup: 0.113764s g_static_string_new: 0.679277s Up until now the cache has always been cleared between tests. Long random strings without empty cache g_static_string_new: 2.186603s For this test the cache was prefilled with 1,000,000 short strings, and then the long string test was run, so there was a 0% hit rate on the cache. Obviously we're hitting issues with GHashTable having so many keys (2,000,000 by the end of the test) and having to do a g_str_equal on a lot of them.
Ultimately though, this change is less about speed, and more about lowering memory usage and fragmentation by sharing strings
(In reply to comment #4) > Bug 622721 has discussion on this. Oddly started by code I wrote 2 years ago, but have no memory of ever doing so.
It'd be significantly more efficient to reuse ELF for readonly data like this. The tradeoff is that you need to either: 1) Scan all of the source code with an external program (we already have too many of these...) 2) Manually maintain an integer->string lookup table See http://www.akkadia.org/drepper/dsohowto.pdf Section 2.4.1
It's not for data that's fixed at compile time. Maybe static is the wrong word?
Why don't we just add g_bytes_new_from_static_string() ?
Well, this gives you more than GBytes does: - a cache so memory is shared which gives the ability to compare strings as just a pointer comparison - other functions to do manipulation on the strings, comparisons, string searches and the like. and while it's a semantic thing, an indication that it's something to use for strings. a hypothetical gtk_label_new_with_static_string (GStaticString *string) seems logical gtk_label_new_with_bytes (GBytes *bytes) doesn't seem just so proper. Given the number of other people who've written something like this (there's now 5 different implementations of a GStaticString type on bugzilla, although 2 of them are written by me), it seems like it's something that is useful to have.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/597.