GNOME Bugzilla – Bug 106925
g_new0 performance
Last modified: 2018-05-23 23:12:48 UTC
g_new0 is very often called with a constant object count, notably one. Therefore, the total size if often known at compile time and the compiler ought to be able to zero the structure in a smarter way that just memset. Unfortunately, g_new0 is implemented in terms of g_malloc0 so the size knowledge is tossed away, :-( For gcc, something like this ought to work: #define g_new0(struct_type, n_structs) \ ({ gsize _gnew0_size = sizeof (struct_type) * (n_structs); struct_type *_gnew0_res = g_malloc (_gnew0_size); memset (_gnew0_res, 0, _gnew0_size); _gnew0_res; })
I think something more along the lines of: #ifdef (GNUC version check) && __OPTIMIZE__ if (__builtin_constant_p (n_structs)) { [inlined version] } else /* g_malloc0 versin */ #else [...] #endif would be better to avoid unnecessary code size bloat, and also g_malloc0() does use calloc(), and some systems may optimize calloc() to avoid unecessarily clearing already 0'ed memory.
I'm not sure if it's still worth it, but FWIW I wrote a patch to do this for g_new0() and g_slice_new0(). The speedup here is ~10% in both cases when doing a couple of million small allocations in a tight loop (no frees). However, well-written code rarely calls either of these functions frequently enough with constant parameters for it to make a noticeable difference. A sweep of GLib itself indicates that it probably won't benefit much, for instance. tests/gobject/run-performance.sh is, if anything, slightly slower. I suspect it's within the margin of error, though. Before: Millions of constructed objects per second: 24.208 Millions of constructed objects per second: 24.516 Millions of constructed objects per second: 24.506 After: Millions of constructed objects per second: 24.213 Millions of constructed objects per second: 24.231 Millions of constructed objects per second: 24.239 I've verified that the inlining happens, e.g. this: 0x0000000000401924 <+4>:mov $0x18,%edi 0x0000000000401929 <+9>: push %rbx 0x000000000040192a <+10>: sub $0x8,%rsp 0x000000000040192e <+14>: callq 0x4013b0 <g_malloc0@plt> 0x0000000000401933 <+19>: mov %rax,%rbx becomes this: 0x0000000000401924 <+4>: mov $0x18,%edi 0x0000000000401929 <+9>: push %rbx 0x000000000040192a <+10>: sub $0x8,%rsp 0x000000000040192e <+14>: callq 0x4011a0 <g_malloc@plt> 0x0000000000401933 <+19>: mov %rax,%rbx 0x0000000000401936 <+22>: movq $0x0,(%rax) 0x000000000040193d <+29>: movq $0x0,0x8(%rax) 0x0000000000401945 <+37>: movq $0x0,0x10(%rax)
Created attachment 302606 [details] [review] 0001-Allow-inlining-of-memset-in-calls-to-g_new0-and-g_sl.patch Allow inlining of memset() in calls to g_new0() and g_slice_new0().
Is anyone still interested in this? FWIW we could apply it to g_slice_new0() only, so g_new0() could still benefit from calloc().
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/glib/issues/5.