Bug 784211 – Should the many Glib::ustring::compose() and ::format() templates be replaced with single variadic templates?

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 784211 - Should the many Glib::ustring::compose() and ::format() templates be replaced with single variadic templates?


Summary:	Should the many Glib::ustring::compose() and ::format() templates be replaced...


Status:	RESOLVED FIXED

Product:	glibmm
Classification:	Bindings
Component:	strings
Version:	2.55.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkmm-forge
QA Contact:	gtkmm-forge

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2017-06-26 11:52 UTC by Daniel Boles
Modified:	2017-09-05 18:45 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
ustring: Remove unused template arguments (963 bytes, patch) 2017-07-06 21:42 UTC, Daniel Boles	committed	Details \| Review
ustring: #include <cstddef> unconditionally (1.08 KB, patch) 2017-07-06 21:42 UTC, Daniel Boles	committed	Details \| Review
ustring: Use std::initializer_list instead of pointer + size (6.89 KB, patch) 2017-07-06 21:43 UTC, Daniel Boles	committed	Details \| Review
ustring: Replace 9×compose() with 1 variadic template (8.66 KB, patch) 2017-07-06 21:43 UTC, Daniel Boles	committed	Details \| Review
ustring: Replace 8×format() with 1 variadic template (7.06 KB, patch) 2017-07-06 21:43 UTC, Daniel Boles	committed	Details \| Review
ustring: Avoid warning due to type confusion (1.46 KB, patch) 2017-09-04 18:18 UTC, Daniel Boles	committed	Details \| Review

Description Daniel Boles 2017-06-26 11:52:11 UTC

Currently there are separate template functions for each number of arguments to Glib::ustring::compose(). Presumably, this should be replaced with a single variadic template function. Is that correct?

Comment 1 Murray Cumming 2017-07-06 18:59:19 UTC

Yes, please.

Comment 2 Daniel Boles 2017-07-06 20:09:09 UTC

Thanks! I'll resurrect my patch from a couple of weeks ago

I think that we should also replace compose_argc with compose_ilist. That is:     Instead of passing around a pointer to an array and its size so that we do not have to template on the size of the array, we can use std::initializer_list, which pretty much exists for this purpose, as it uses compiler magic to abstract a variable sized array behind iterators. This is a bit more concise and considerably more idiomatic.

If there are no objections, I'll make that part of the same patch set.

Comment 3 Daniel Boles 2017-07-06 20:13:51 UTC

compose_argv(), that is.

I also wonder about that helper class Stringify:

 * I haven't got a proof-of-concept yet, but instinctively I feel like it could
   be replaced with methods that simply return ustrings by value (subject to
   copy elision), instead of what seems like a convoluted mechanism of stashing
   the [copied or format()ted] string in itself, just so that it can give a
   pointer out to compose_argv. It should be possible to construct the various
   overloads in place in the initializer_list and then just have compose_ilist()
   loop over those.

 * Are all the comments about overloads that are required for Visual Studio
   still current? The horrible ambiguity between different variations of char
   continues to mystify me, and I don't have VS to test, but I feel like all
   the added noise of these shouldn't be required, at least not anymore. I guess
   I'll try to catch one of the Windows folk on IRC some time!

Comment 4 Daniel Boles 2017-07-06 21:42:50 UTC

Created attachment 355051 [details] [review]
ustring: Remove unused template arguments

Comment 5 Daniel Boles 2017-07-06 21:42:56 UTC

Created attachment 355052 [details] [review]
ustring: #include <cstddef> unconditionally

It may not be needed for std::ptrdiff_t, as that is only used
conditionally, but we use std::size_t either way in some Stringify
specialisations, and while that is almost certain to be pulled in from
<string> or some other standard header, we might as well be explicit.

Comment 6 Daniel Boles 2017-07-06 21:43:03 UTC

Created attachment 355053 [details] [review]
ustring: Use std::initializer_list instead of pointer + size

Acrobatics are needed so that the main compose function does not have to
be templated on the number of strings due to the bloat that would cause.

Since C++11, instead of passing a pointer to an array & its size, we can
std::initializer_list, which pretty much exists for this purpose, as it
uses compiler magic to abstract a variable sized array behind iterators.

The only drawback is that initializer_list does not have an operator[],
but its elements are guaranteed to be allocated in the order defined, so
we can use (*(begin() + index)) – a bit ugly, but we only need it once!

Comment 7 Daniel Boles 2017-07-06 21:43:09 UTC

Created attachment 355054 [details] [review]
ustring: Replace 9×compose() with 1 variadic template

No more gratuitous repetition, hooray for modern C++!

Comment 8 Daniel Boles 2017-07-06 21:43:17 UTC

Created attachment 355055 [details] [review]
ustring: Replace 8×format() with 1 variadic template

as per the previous commit for Glib::ustring.

Now, users can pass format() as many arguments as their compiler or
stack will allow, rather than being limited to a maximum of 8.

Comment 9 Daniel Boles 2017-07-06 22:33:04 UTC

I should point out that make check, including its tests for ustring::compose and format, is still 100% happy after all this. :)

Comment 10 Daniel Boles 2017-07-06 23:02:08 UTC

(In reply to Daniel Boles from comment #9)
> I should point out that make check, including its tests for ustring::compose
> and format, is still 100% happy after all this. :)

I just looked at those tests, and they don't actually check for any expected output, just that the functions don't throw... Maybe I'll work on those another time.


A weird difference between glibmm-2-52 and master, even before my changes, is that master format()s the pointer address of &i in the test with commas, despite it being a hex address, for example:

    0x7ff,fb2,8d6,b1c

Any ideas why the two branches differ here? A quick search suggests it might be something to do with locales, if anything changed there between them.

Comment 11 Daniel Boles 2017-07-07 17:32:16 UTC

Review of attachment 355054 [details] [review]:

I've since renamed compose_ilist() to compose_private(), and removed the comment about perfect forwarding, since it hardly matters and conflicts with Stringify (ambiguous overloads)

Comment 12 Daniel Boles 2017-09-01 11:41:20 UTC

Ping for a review?

I figured a WIP branch might be nicer to read, so I put one up:

https://git.gnome.org/browse/glibmm/log/?h=wip/dboles/ustring

There are no functional differences from the patches here.

Comment 13 Murray Cumming 2017-09-03 19:21:31 UTC

It looks good to me. Please go ahead and push it to master. Thanks for taking care of this.

Comment 14 Daniel Boles 2017-09-03 19:38:54 UTC

Thanks for the review! I've merged it.

The one thing I still wonder about is the oddity in Comment 10, but since it already occurred before this, I guess we can deal with it separately, if needed.

Comment 15 Kjell Ahlstedt 2017-09-04 14:14:29 UTC

When I build glibmm after configuring with --enable-warnings=fatal, I get

ustring.cc: In static member function ‘static Glib::ustring Glib::ustring::compose_private(const Glib::ustring&, std::initializer_list<const Glib::ustring*>)’:
ustring.cc:1317:17: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
       if (index >= 0 && index < ilist.size())
           ~~~~~~^~~~

This is because of
  const std::size_t index = Ascii::digit_value(stop[1]) - 1;
  if (index >= 0 && index < ilist.size())

Ascii::digit_value() can return 0 or -1. Ascii::digit_value(stop[1]) - 1 can be
negative. A possible fix is

  const int index = Ascii::digit_value(stop[1]) - 1;
  if (index >= 0 && static_cast<std::size_t>(index) < ilist.size())

Or can be rely on the fact that a negative value usually becomes a huge positive
value when a signed int is converted to an unsigned int?

  const std::size_t index = Ascii::digit_value(stop[1]) - 1;
  if (index < ilist.size())

I prefer the fix with a signed index.

Comment 16 Kjell Ahlstedt 2017-09-04 14:51:31 UTC

(In reply to Daniel Boles from comment #10)
> A weird difference between glibmm-2-52 and master, even before my changes,
> is that master format()s the pointer address of &i in the test with commas,
> despite it being a hex address, for example:
> 
>     0x7ff,fb2,8d6,b1c
> 
> Any ideas why the two branches differ here? A quick search suggests it might
> be something to do with locales, if anything changed there between them.

Yes, something changed there. See bug 661588. The git master of the gtkmm
tutorial (gtkmm-documentation) contains

  gtkmm-4.0 is used in combination with glibmm-2.54, which sets the global
  locale for your program. The older glibmm-2.4 does not do that, and gtkmm-3.0
  does it only to some extent. What this means is briefly that if your gtkmm-3.0
  program contains a call to std::locale::global(std::locale("")), you can
  probably remove it. If you don't want glibmm or gtkmm to set the global locale
  for you, you should add a call to
  Glib::set_init_to_users_preferred_locale(false) before any call to
  Glib::init() or Gtk::Application::create(). 

glibmm-2.54 should be changed to glibmm-2.56.

Surprising that thousands separators are used in hexadecimal numbers. When I run
glibmm_ustring_compose with locale sv_SE.UTF-8, it's written
  0x7ff c37 5ed 0b4

Comment 17 Daniel Boles 2017-09-04 14:52:14 UTC

Whoops, I'm usually really paranoid about warnings, so I don't know how that slipped through; apologies.


(In reply to Kjell Ahlstedt from comment #15)
> A possible fix is
> 
>   const int index = Ascii::digit_value(stop[1]) - 1;
>   if (index >= 0 && static_cast<std::size_t>(index) < ilist.size())

I guess it doesn't matter here, since we only accept placeholders 1~9, but personally I don't like using plain ints for indices, as a general rule, because it seems wrong (you know, in case of that really likely, practical scenario of a container with > 2.5 million items... :P)


> Or can be rely on the fact that a negative value usually becomes a huge
> positive
> value when a signed int is converted to an unsigned int?
> 
>   const std::size_t index = Ascii::digit_value(stop[1]) - 1;
>   if (index < ilist.size())

Perhaps weirdly, I like this more, instinctively. However, we would have to cast the result of digit_value() to std::size_t first, then subtract - as while underflow is defined for unsigned values, it is undefined behaviour for integers, AFAIK.


Anything that mixes integers of different signages and widths makes me flinch, and usually is doomed to code full of compromises.

So, while I prefer the latter, I guess all we really need is something that works; please do whatever you prefer, or whatever is most in keeping with how you normally handle such situations in glibmm.

Comment 18 Daniel Boles 2017-09-04 17:26:44 UTC

Ignore the bit about underflow; I have no idea why I thought that was relevant... in my defence, I was thinking about 2 other things at the same time!

Comment 19 Daniel Boles 2017-09-04 18:18:37 UTC

Created attachment 359096 [details] [review]
ustring: Avoid warning due to type confusion

error: comparison of unsigned expression >= 0 is always true
  [-Werror=type-limits]
       if (index >= 0 && index < ilist.size())

As ilist.size() is static_assert()ed in compose() to be in range of int,
we can safely convert to int for use as the bound. For clarity, convert
to a new int, rather than comparing to static_cast<int>(ilist.size()).

Using an int is clearer than e.g. relying on std::size_t underflowing to
its max if digit_value() returns < 0 and therefore being < ilist.size().


--

This seems like the most readable and self-documenting way to do it, but I might
be wrong.

Comment 20 Daniel Boles 2017-09-04 18:19:58 UTC

(In reply to Daniel Boles from comment #19)
> its max if digit_value() returns < 0 and therefore being < ilist.size().

s/therefore/hence not/, already updated locally

Comment 21 Kjell Ahlstedt 2017-09-05 15:21:29 UTC

Your patch in comment 19 looks great. It's okay with me.

Comment 22 Daniel Boles 2017-09-05 18:45:31 UTC

Attachment 359096 [details] pushed as 27309b1 - ustring: Avoid warning due to type confusion