Bug 781829 – Add convenience functions for converting string to number

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 781829 - Add convenience functions for converting string to number


Summary:	Add convenience functions for converting string to number


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2017-04-27 12:18 UTC by Krzesimir Nowak
Modified:	2017-05-10 15:03 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Adds generic error domain and codes (7.34 KB, patch) 2017-04-27 12:18 UTC, Krzesimir Nowak	none	Details \| Review
Adds convenience API for converting a string to a number (11.50 KB, patch) 2017-04-27 12:19 UTC, Krzesimir Nowak	none	Details \| Review
Adds convenience API for converting a string to a number (16.25 KB, patch) 2017-04-27 19:33 UTC, Krzesimir Nowak	none	Details \| Review
Adds an index for 2.54 api (912 bytes, patch) 2017-04-27 19:33 UTC, Krzesimir Nowak	committed	Details \| Review
Adds convenience API for converting a string to a number, v3 (16.25 KB, patch) 2017-04-28 11:05 UTC, Krzesimir Nowak	none	Details \| Review
Adds convenience API for converting a string to a number, v4 (16.13 KB, patch) 2017-04-28 11:15 UTC, Krzesimir Nowak	none	Details \| Review
Adds convenience API for converting a string to a number, v5 (21.44 KB, patch) 2017-05-02 12:52 UTC, Krzesimir Nowak	committed	Details \| Review
Adds convenience API for converting a string to a number, v6 (21.39 KB, patch) 2017-05-10 10:32 UTC, Krzesimir Nowak	committed	Details \| Review
Fixes translation issues in string to number conversion functions (2.18 KB, patch) 2017-05-10 14:14 UTC, Krzesimir Nowak	committed	Details \| Review

Description Krzesimir Nowak 2017-04-27 12:18:28 UTC

Created attachment 350544 [details] [review]
Adds generic error domain and codes

This is a strawman proposal for adding a convenience API for converting strings to numbers. Problems:
- I didn't know where to add it - gstrfuncs.h looks like a place for rather low-level string functions
- Uses different conventions than the other string functions (returns a bool and takes a GError instead of returning an error with errno)
- Should I specify a separate error domain and codes just for these two functions? (I added a generic error API as a substrawman proposal to stop abusing G_IO_ERROR_FAILED, which can't be used here, because it is in Gio. Once this question is resolved the subproposal can possibly be moved into a separate bug.)

Please see commit messages for some explanations.

Comment 1 Krzesimir Nowak 2017-04-27 12:19:05 UTC

Created attachment 350545 [details] [review]
Adds convenience API for converting a string to a number

Comment 2 Matthias Clasen 2017-04-27 12:20:52 UTC

Not sure I'm a fan of adding more convenience api like this. But one question in advance: why do you need signed/unsigned if you have min/max ?

Comment 3 Philip Withnall 2017-04-27 12:38:55 UTC

(In reply to Matthias Clasen from comment #2)
> Not sure I'm a fan of adding more convenience api like this.

I would consider this API a replacement for g_ascii_strto{u,}ll(), whose API is horrendous and almost universally used incorrectly by people. The error reporting from it is hard (there are 3 different error mechanisms you need to check for — errno, (endptr != nptr), (*endptr != '\0')) and the fact it doesn’t do range checking means that people often forget to do range checking. Given that g_ascii_strto{u,}ll() is typically used for parsing untrusted input, we want to nudge people towards range checking everything.

(In reply to Matthias Clasen from comment #2)
> why do you need signed/unsigned if you have min/max ?

I guess because you can’t have a type which can represent the range [G_MININT64, G_MAXUINT64].

Comment 4 Philip Withnall 2017-04-27 12:43:38 UTC

Review of attachment 350544 [details] [review]:

Not sure we want to do this. Developers abuse G_IO_ERROR_FAILED because it’s easier than defining their own error domain, but that’s really what they should be doing. I think we want to make it easier to define error domains — maybe it is possible to reduce the boilerplate a bit by introducing something like G_DECLARE_*_TYPE but for error domains?

Comment 5 Philip Withnall 2017-04-27 12:51:34 UTC

Review of attachment 350545 [details] [review]:

::: glib/gstrfuncs.c
@@ +3172,3 @@
+ * @min: a lower bound
+ * @max: an upper bound
+ * @out_num: (out) (nullable) (optional): a return location for a number

(nullable) is inappropriate here: (optional) is the annotation you want. (nullable) means that the *returned* value can be NULL, which can’t be the case for a returned gint64 value. (nullable) (optional) is appropriate for things like GError** parameters (although in practice we don’t need to add that annotation for GError** parameters, because gobject-introspection adds it implicitly due to the known semantics of GError).

@@ +3179,3 @@
+ * This function assumes that @str contains only a decimal number that
+ * is within inclusive bounds limited by @min and @max. If this is
+ * true, then the converted number is stored in @out_num.

Probably want to mention somewhere that "" is considered invalid input.

@@ +3181,3 @@
+ * true, then the converted number is stored in @out_num.
+ *
+ * See g_ascii_strtoll() if you have more complex needs.

…such as parsing a string which starts with a number, but then has other characters.

@@ +3203,3 @@
+  if (str[0] == '\0')
+    {
+      g_set_generic_error_literal (error, _("Empty string is not a number"));

As far as the error handling goes in this patch, I would consider adding a new error domain with errors for INVALID_NUMBER and NUMBER_OUTSIDE_RANGE (strawman names).

@@ +3207,3 @@
+    }
+
+  errno = 0;

Do we want to save (and later restore) the previous value of errno, so that this function appears to the caller to have no side-effects on errno?

@@ +3217,3 @@
+    }
+  if (number < min ||
+      number > max)

Why put a line break in this if-statement?

::: glib/tests/strfuncs.c
@@ +1536,3 @@
+  { "a",  UNSIGNED,  0, 2, 0, TRUE },
+  { "1a", SIGNED,   -2, 2, 0, TRUE },
+  { "1a", UNSIGNED,  0, 2, 0, TRUE },

I’d add some tests for (min == max) too.

@@ +1544,3 @@
+  gsize idx;
+
+  for (idx = 0; idx < G_N_ELEMENTS(test_data); ++idx)

Need a space before the `(` in `G_N_ELEMENTS`.

Comment 6 Krzesimir Nowak 2017-04-27 17:55:43 UTC

(In reply to Philip Withnall from comment #4)
> Review of attachment 350544 [details] [review] [review]:
> 
> Not sure we want to do this. Developers abuse G_IO_ERROR_FAILED because it’s
> easier than defining their own error domain, but that’s really what they
> should be doing. I think we want to make it easier to define error domains —
> maybe it is possible to reduce the boilerplate a bit by introducing
> something like G_DECLARE_*_TYPE but for error domains?

Correct me if I'm wrong, but I don't think that introducing an error domain is much of a hassle. Basically:

typedef enum
  {
    FOO_CODES_GO_HERE
  } FooError;

#define FOO_ERROR (foo_error_quark ())

and in C file:

G_DEFINE_QUARK (foo-error-quark, foo_error)

So that's already easy enough. There is more hassle if you want FooError to have a GType, because that involves possibly writing template files and hooking glib-mkenums to the build system and GIOErrorEnum already has that. For completeness I'll mention that if you want to send FooError over D-Bus as a D-Bus error then it means writing GDBusErrorEntry and registering that with g_dbus_error_register_error_domain (which you likely do in the foo_error_quark function, thus it's impossible to use the G_DEFINE_QUARK macro). But if you reach that point then you need to bite the bullet and write some boilerplate.

I actually think that what is harder/annoying to do is to come up with the error codes for every possible failure scenario. This is possibly a really boring/demotivating task, so developers tend to use just some error code that is as generic as it gets. So they find G_IO_ERROR_FAILED.

I too think that it is a bad practice, but I have no idea how to overcome it.

Anyway, for this issue, I'll create a separate error domain and codes for the conversion functions and will file a separate issue for the generic errors.

Comment 7 Krzesimir Nowak 2017-04-27 18:03:07 UTC

(In reply to Philip Withnall from comment #5)
> Review of attachment 350545 [details] [review] [review]:
> 
> ::: glib/gstrfuncs.c
> @@ +3172,3 @@
> + * @min: a lower bound
> + * @max: an upper bound
> + * @out_num: (out) (nullable) (optional): a return location for a number
> 
> (nullable) is inappropriate here: (optional) is the annotation you want.
> (nullable) means that the *returned* value can be NULL, which can’t be the
> case for a returned gint64 value. (nullable) (optional) is appropriate for
> things like GError** parameters (although in practice we don’t need to add
> that annotation for GError** parameters, because gobject-introspection adds
> it implicitly due to the known semantics of GError).

I was wondering what is the difference between nullable and optional in context of an output parameter. Now I know, thank you.

> 
> @@ +3179,3 @@
> + * This function assumes that @str contains only a decimal number that
> + * is within inclusive bounds limited by @min and @max. If this is
> + * true, then the converted number is stored in @out_num.
> 
> Probably want to mention somewhere that "" is considered invalid input.

Yeah, I was thinking about it too, but then stated that if function assumes that a string is a number then it rules out an empty string, because nothing is not a number. But I'll add that for clarity.

> 
> @@ +3181,3 @@
> + * true, then the converted number is stored in @out_num.
> + *
> + * See g_ascii_strtoll() if you have more complex needs.
> 
> …such as parsing a string which starts with a number, but then has other
> characters.

Thanks, will add it. And will mention different base (like hexadecimal).

> 
> @@ +3203,3 @@
> +  if (str[0] == '\0')
> +    {
> +      g_set_generic_error_literal (error, _("Empty string is not a
> number"));
> 
> As far as the error handling goes in this patch, I would consider adding a
> new error domain with errors for INVALID_NUMBER and NUMBER_OUTSIDE_RANGE
> (strawman names).

Will do.

> 
> @@ +3207,3 @@
> +    }
> +
> +  errno = 0;
> 
> Do we want to save (and later restore) the previous value of errno, so that
> this function appears to the caller to have no side-effects on errno?
> 

Nah. There are no guarantees that calling any function will preserve the errno, so why should these function be some special snowflakes?

> @@ +3217,3 @@
> +    }
> +  if (number < min ||
> +      number > max)
> 
> Why put a line break in this if-statement?

Ah, that if used to be a bit longer - there wasn't a separate error for the value being out of bounds. Will make it a single line.

> 
> ::: glib/tests/strfuncs.c
> @@ +1536,3 @@
> +  { "a",  UNSIGNED,  0, 2, 0, TRUE },
> +  { "1a", SIGNED,   -2, 2, 0, TRUE },
> +  { "1a", UNSIGNED,  0, 2, 0, TRUE },
> 
> I’d add some tests for (min == max) too.

Right, will do.

> 
> @@ +1544,3 @@
> +  gsize idx;
> +
> +  for (idx = 0; idx < G_N_ELEMENTS(test_data); ++idx)
> 
> Need a space before the `(` in `G_N_ELEMENTS`.

Bah, it's second time you point it out. ;) Will fix it.

Comment 8 Matthias Clasen 2017-04-27 18:35:25 UTC

I'm against the generic error stuff. That is basically invalidating the entire approach of GError.

Comment 9 Matthias Clasen 2017-04-27 18:39:47 UTC

(In reply to Philip Withnall from comment #3)
> (In reply to Matthias Clasen from comment #2)
> > Not sure I'm a fan of adding more convenience api like this.
> 
> I would consider this API a replacement for g_ascii_strto{u,}ll(), whose API
> is horrendous and almost universally used incorrectly by people. The error
> reporting from it is hard (there are 3 different error mechanisms you need
> to check for — errno, (endptr != nptr), (*endptr != '\0')) and the fact it
> doesn’t do range checking means that people often forget to do range
> checking. Given that g_ascii_strto{u,}ll() is typically used for parsing
> untrusted input, we want to nudge people towards range checking everything.

If it is a replacement...

> * See g_ascii_strtoll() if you have more complex needs.

It shouldn't defer back to it for all but the most basic needs.

Comment 10 Krzesimir Nowak 2017-04-27 19:33:09 UTC

Created attachment 350592 [details] [review]
Adds convenience API for converting a string to a number

Comment 11 Krzesimir Nowak 2017-04-27 19:33:39 UTC

Created attachment 350593 [details] [review]
Adds an index for 2.54 api

Comment 12 Krzesimir Nowak 2017-04-27 19:41:28 UTC

Updated the patches. The generic error is gone, review issues are addressed.

(In reply to Matthias Clasen from comment #8)
> I'm against the generic error stuff. That is basically invalidating the
> entire approach of GError.

Alright, I assume that there is no point in opening a separate issue for the generic errors then.

(In reply to Matthias Clasen from comment #9)
> (In reply to Philip Withnall from comment #3)
> > (In reply to Matthias Clasen from comment #2)
> > > Not sure I'm a fan of adding more convenience api like this.
> > 
> > I would consider this API a replacement for g_ascii_strto{u,}ll(), whose API
> > is horrendous and almost universally used incorrectly by people. The error
> > reporting from it is hard (there are 3 different error mechanisms you need
> > to check for — errno, (endptr != nptr), (*endptr != '\0')) and the fact it
> > doesn’t do range checking means that people often forget to do range
> > checking. Given that g_ascii_strto{u,}ll() is typically used for parsing
> > untrusted input, we want to nudge people towards range checking everything.
> 
> If it is a replacement...
> 
> > * See g_ascii_strtoll() if you have more complex needs.
> 
> It shouldn't defer back to it for all but the most basic needs.

Admittedly the new functions are less powerful than the g_ascii_strto{u,}ll functions, since they assume that the entire string is a decimal number and only a decimal number (so strings like "  123  " or "123ZZZ" or plain "a" as hexadecimal 10 will not work). Not sure how big this problem is, though.

Comment 13 Philip Withnall 2017-04-28 09:20:58 UTC

(In reply to Krzesimir Nowak from comment #6)
> (In reply to Philip Withnall from comment #4)
> > Review of attachment 350544 [details] [review] [review] [review]:
> > 
> > Not sure we want to do this. Developers abuse G_IO_ERROR_FAILED because it’s
> > easier than defining their own error domain, but that’s really what they
> > should be doing. I think we want to make it easier to define error domains —
> > maybe it is possible to reduce the boilerplate a bit by introducing
> > something like G_DECLARE_*_TYPE but for error domains?
> 
> Correct me if I'm wrong, but I don't think that introducing an error domain
> is much of a hassle. Basically:
> 
> typedef enum
>   {
>     FOO_CODES_GO_HERE
>   } FooError;
> 
> #define FOO_ERROR (foo_error_quark ())

Also needs a declaration for foo_error_quark(). And some documentation.

> So that's already easy enough. There is more hassle if you want FooError to
> have a GType, …

The boilerplate does build up then. :-)  I think it might be possible to reduce, but not by much.

> I actually think that what is harder/annoying to do is to come up with the
> error codes for every possible failure scenario. This is possibly a really
> boring/demotivating task, so developers tend to use just some error code
> that is as generic as it gets. So they find G_IO_ERROR_FAILED.

True. Maybe this is just something we need to be careful to catch in code review. ⇒ Review comments like “are you sure there isn’t a more appropriate error code for this?”.

> Anyway, for this issue, I'll create a separate error domain and codes for
> the conversion functions and will file a separate issue for the generic
> errors.

Comment 14 Philip Withnall 2017-04-28 09:40:59 UTC

Review of attachment 350592 [details] [review]:

::: glib/gstrfuncs.c
@@ +3171,3 @@
+ * @str: a string
+ * @min: a lower bound
+ * @max: an upper bound

Maybe put ‘ (inclusive)’ on the end of the @min and @max parameter descriptions.

@@ +3180,3 @@
+ * is within inclusive bounds limited by @min and @max. If this is
+ * true, then the converted number is stored in @out_num. An empty
+ * string is not a valid input. A string with a leading or trailing

Nitpick: s/A string with a/A string with/ (‘whitespace’ is uncountable).

@@ +3185,3 @@
+ * See g_ascii_strtoll() if you have more complex needs such as
+ * parsing a string which starts with a number, but then has other
+ * characters, or if a number is not decimal.

Ridiculous nitpick: I’d use a semicolon here instead of the comma. You can use it as a second type of list separator, which is useful if you want to use the comma for separating clauses in a list element, as you have done above. (https://en.wikipedia.org/wiki/Semicolon#Usage, first bullet point)

@@ +3187,3 @@
+ * characters, or if a number is not decimal.
+ *
+ * Returns: %TRUE if @str was a number, otherwise %FALSE.

It would be useful to mention the possible error codes and what causes them somewhere in this documentation comment.

@@ +3212,3 @@
+      return FALSE;
+    }
+  if (g_ascii_isspace (str[0]))

Why handle spaces differently here?

@@ +3232,3 @@
+      return FALSE;
+    }
+  if (errno != 0 || end_ptr == NULL || *end_ptr != '\0')

You could move this case to be the first error handling case (and additionally check that errno != ERANGE)…

@@ +3239,3 @@
+      return FALSE;
+    }
+  if (number < min || number > max)

…then you could merge this case with the (errno == ERANGE) check so the error reporting code isn’t duplicated.

@@ +3251,3 @@
+    {
+      *out_num = number;
+    }

GLib style typically omits the braces for single-line if-blocks.

::: glib/gstrfuncs.h
@@ +323,3 @@
+    G_STRING_TO_NUMBER_INVALID,
+    G_STRING_TO_NUMBER_OUT_OF_BOUNDS,
+  } GStringToNumberError;

Not sure about the naming of this error domain. How about GNumberParserError? Other suggestions welcome.

::: glib/tests/strfuncs.c
@@ +1516,3 @@
+  gint expected;
+  gboolean should_fail;
+  gint error_code;

s/gint/GStringToNumberError/ (type safety)

Comment 15 Philip Withnall 2017-04-28 09:44:54 UTC

Review of attachment 350593 [details] [review]:

This looks fine to commit once it’s needed.

Comment 16 Krzesimir Nowak 2017-04-28 10:47:45 UTC

(In reply to Philip Withnall from comment #14)
> Review of attachment 350592 [details] [review] [review]:
> 
> ::: glib/gstrfuncs.c
> @@ +3171,3 @@
> + * @str: a string
> + * @min: a lower bound
> + * @max: an upper bound
> 
> Maybe put ‘ (inclusive)’ on the end of the @min and @max parameter
> descriptions.

Will do.

> 
> @@ +3180,3 @@
> + * is within inclusive bounds limited by @min and @max. If this is
> + * true, then the converted number is stored in @out_num. An empty
> + * string is not a valid input. A string with a leading or trailing
> 
> Nitpick: s/A string with a/A string with/ (‘whitespace’ is uncountable).

Articles shmarticles. *grumble grumble*

Thanks, will fix it. :)

> 
> @@ +3185,3 @@
> + * See g_ascii_strtoll() if you have more complex needs such as
> + * parsing a string which starts with a number, but then has other
> + * characters, or if a number is not decimal.
> 
> Ridiculous nitpick: I’d use a semicolon here instead of the comma. You can
> use it as a second type of list separator, which is useful if you want to
> use the comma for separating clauses in a list element, as you have done
> above. (https://en.wikipedia.org/wiki/Semicolon#Usage, first bullet point)
> 

Ha, thanks, will do.

> @@ +3187,3 @@
> + * characters, or if a number is not decimal.
> + *
> + * Returns: %TRUE if @str was a number, otherwise %FALSE.
> 
> It would be useful to mention the possible error codes and what causes them
> somewhere in this documentation comment.
> 

Alright.

> @@ +3212,3 @@
> +      return FALSE;
> +    }
> +  if (g_ascii_isspace (str[0]))
> 
> Why handle spaces differently here?
> 

The way we check the end_ptr disallows trailing whitespace. So for consistency we should not allow leading whitespace, but g_strto{u,}ll allows it - it just skips it. So we have two options here - either we allow both leading and trailing whitespace or we ban both. For now I opted for the latter. What do you think?

> @@ +3232,3 @@
> +      return FALSE;
> +    }
> +  if (errno != 0 || end_ptr == NULL || *end_ptr != '\0')
> 
> You could move this case to be the first error handling case (and
> additionally check that errno != ERANGE)…
> 

Right. Will do.

> @@ +3239,3 @@
> +      return FALSE;
> +    }
> +  if (number < min || number > max)
> 
> …then you could merge this case with the (errno == ERANGE) check so the
> error reporting code isn’t duplicated.
> 

Hm, I was wondering if I should also move the leading whitespace checking after the call to g_ascii_strto{u,}ll to avoid duplicating the "invalid input" error reporting at the expense of throwing away the parsing results.

> @@ +3251,3 @@
> +    {
> +      *out_num = number;
> +    }
> 
> GLib style typically omits the braces for single-line if-blocks.

Alright.

> 
> ::: glib/gstrfuncs.h
> @@ +323,3 @@
> +    G_STRING_TO_NUMBER_INVALID,
> +    G_STRING_TO_NUMBER_OUT_OF_BOUNDS,
> +  } GStringToNumberError;
> 
> Not sure about the naming of this error domain. How about
> GNumberParserError? Other suggestions welcome.

Thanks for the name - it is way better than what I came up with.

> 
> ::: glib/tests/strfuncs.c
> @@ +1516,3 @@
> +  gint expected;
> +  gboolean should_fail;
> +  gint error_code;
> 
> s/gint/GStringToNumberError/ (type safety)

Ok.

Comment 17 Philip Withnall 2017-04-28 10:56:29 UTC

(In reply to Krzesimir Nowak from comment #16)
> (In reply to Philip Withnall from comment #14)
> > Review of attachment 350592 [details] [review] [review] [review]:
> > @@ +3212,3 @@
> > +      return FALSE;
> > +    }
> > +  if (g_ascii_isspace (str[0]))
> > 
> > Why handle spaces differently here?
> > 
> 
> The way we check the end_ptr disallows trailing whitespace. So for
> consistency we should not allow leading whitespace, but g_strto{u,}ll allows
> it - it just skips it. So we have two options here - either we allow both
> leading and trailing whitespace or we ban both. For now I opted for the
> latter. What do you think?

Ah, that makes sense. Probably worth a comment in the source.

> > @@ +3239,3 @@
> > +      return FALSE;
> > +    }
> > +  if (number < min || number > max)
> > 
> > …then you could merge this case with the (errno == ERANGE) check so the
> > error reporting code isn’t duplicated.
> > 
> 
> Hm, I was wondering if I should also move the leading whitespace checking
> after the call to g_ascii_strto{u,}ll to avoid duplicating the "invalid
> input" error reporting at the expense of throwing away the parsing results.

That probably makes sense. Parsing is cheap, and that’s an unlikely error case.

Comment 18 Krzesimir Nowak 2017-04-28 11:05:48 UTC

Created attachment 350632 [details] [review]
Adds convenience API for converting a string to a number, v3

Updated the patch.

Comment 19 Krzesimir Nowak 2017-04-28 11:07:09 UTC

Hm, haven't noticed your reply, will update the patch again to update the leading whitespace check.

Comment 20 Krzesimir Nowak 2017-04-28 11:15:36 UTC

Created attachment 350633 [details] [review]
Adds convenience API for converting a string to a number, v4

Updated.

Comment 21 Philip Withnall 2017-04-28 13:17:14 UTC

(In reply to Matthias Clasen from comment #9)
> (In reply to Philip Withnall from comment #3)
> > (In reply to Matthias Clasen from comment #2)
> > > Not sure I'm a fan of adding more convenience api like this.
> > 
> > I would consider this API a replacement for g_ascii_strto{u,}ll(), whose API
> > is horrendous and almost universally used incorrectly by people. The error
> > reporting from it is hard (there are 3 different error mechanisms you need
> > to check for — errno, (endptr != nptr), (*endptr != '\0')) and the fact it
> > doesn’t do range checking means that people often forget to do range
> > checking. Given that g_ascii_strto{u,}ll() is typically used for parsing
> > untrusted input, we want to nudge people towards range checking everything.
> 
> If it is a replacement...
> 
> > * See g_ascii_strtoll() if you have more complex needs.
> 
> It shouldn't defer back to it for all but the most basic needs.

Odd, I wrote a reply to this but Bugzilla lost it.

I think I was wrong to use ‘replacement’. These functions cover the common use cases and are a fair bit easier to use than g_strto{u,}ll(). Apart from parsing hex and octal, I think these functions would cover all the times I’ve ever had to use g_strto{u,}ll(), ever.

Thinking about it, we could add support for other bases to the new functions quite easily with a guint base parameter; it wouldn’t make the error handling any more complex. I think that might be worth it.

(Sorry Krzesimir, I did suggest this after about v2 of the patch, but my Bugzilla comment got eaten somewhere.)

Comment 22 Matthias Clasen 2017-05-01 16:12:41 UTC

I'm still not fully on board with this api addition, so we should get a tie-breaking vote here from someone else before going too much further with encouraging more work on the patches...

Comment 23 Colin Walters 2017-05-01 16:18:04 UTC

I looked at this problem domain recently when https://github.com/ostreedev/ostree/issues/686 went by and realized just how horrible the error checking logic is today.

I only glanced at these patches, they seem sane to me. Agree we should support multiple bases as well, not just decimal so we can be a lot closer toa full replacement.

Comment 24 Krzesimir Nowak 2017-05-02 12:52:41 UTC

Created attachment 350871 [details] [review]
Adds convenience API for converting a string to a number, v5

Updated a patch, adding a base parameter. I reworded the commit message a bit to mention that the functions are a replacement API for g_strto{u,}ll functions.

Discussed a bit about it on IRC, so we agreed to:
1. Disallow passing 0 as a base. In g_strtoull it meant that numbers starting with 0x were treated as hexadecimal, otherwise numbers starting with just 0 were treated as octal otherwise decimal.
2. Disallow passing hexadecimal number starting with 0x or 0X. I still allow passing numbers as 0444.
3. With g_strtoull, you could pass "-1" to parse to get G_MAXUINT64. We don't allow that in new functions. Actually, I don't allow passing any explicitly signed number to g_ascii_string_to_unsigned (so, no "-5" nor "+5"; just "5" is allowed), but I'm not sure about this. If we allow explicitly signed numbers to be passed to the to_unsigned function, should we treat "-0" as a special case? Also, should -5 be an invalid unsigned number or a number out of bounds? Currently it's invalid, because it is not an unsigned number.

Comment 25 Philip Withnall 2017-05-02 13:05:40 UTC

(In reply to Krzesimir Nowak from comment #24)
> Created attachment 350871 [details] [review] [review]
> 3. With g_strtoull, you could pass "-1" to parse to get G_MAXUINT64. We
> don't allow that in new functions. Actually, I don't allow passing any
> explicitly signed number to g_ascii_string_to_unsigned (so, no "-5" nor
> "+5"; just "5" is allowed), but I'm not sure about this.

I agree; no signs at all. The purpose of these functions is not to parse user input (which you would want to parse in the current locale), it’s to parse numbers from file formats and network packets, whose format should be well defined.

> Also, should -5 be an invalid unsigned
> number or a number out of bounds? Currently it's invalid, because it is not
> an unsigned number.

Don’t mind. I’d lean towards reporting it as out of bounds rather than invalid.

Comment 26 Philip Withnall 2017-05-02 13:07:10 UTC

(In reply to Philip Withnall from comment #25)
> (In reply to Krzesimir Nowak from comment #24)
> > Created attachment 350871 [details] [review] [review] [review]
> > 3. With g_strtoull, you could pass "-1" to parse to get G_MAXUINT64. We
> > don't allow that in new functions. Actually, I don't allow passing any
> > explicitly signed number to g_ascii_string_to_unsigned (so, no "-5" nor
> > "+5"; just "5" is allowed), but I'm not sure about this.
> 
> The purpose of these functions is not to parse
> user input (which you would want to parse in the current locale), it’s to
> parse numbers from file formats and network packets, whose format should be
> well defined.

I appreciate I’m taking a fairly hard-line attitude here, which is potentially at odds with the maximum that parsers should be liberal in what they accept (as long as the input is unambiguous — which a sign or ‘0x’ would be). I’d appreciate a second opinion here.

Comment 27 Colin Walters 2017-05-09 15:54:46 UTC

In the ostree case (unix cmdline) I wanted it to parse user input but not *locale* input.  This is a domain I don't know a lot about honestly; for command line tools, is it common to accept non-ASCII numbers?

I don't know whether that changes anything in this discussion or not.  Basically I just want numeric conversion APIs with less shitty error handling.

Comment 28 Philip Withnall 2017-05-09 17:16:12 UTC

(In reply to Colin Walters from comment #27)
> In the ostree case (unix cmdline) I wanted it to parse user input but not
> *locale* input.  This is a domain I don't know a lot about honestly; for
> command line tools, is it common to accept non-ASCII numbers?

If it’s user input, surely you want to parse it in the current locale, so you accept the user’s locale’s decimal separator (‘,’ vs ‘.’, etc.)?

> I don't know whether that changes anything in this discussion or not. 
> Basically I just want numeric conversion APIs with less shitty error
> handling.

I don’t think it does; you either want to use these APIs, or something like sscanf() which *does* operate in the user’s locale.

I’m going to make an executive decision and say that signs and ‘0x’ are not going to be accepted.

Comment 29 Philip Withnall 2017-05-09 17:21:31 UTC

Review of attachment 350871 [details] [review]:

Looks good to push to master with these nitpicks fixed.

::: glib/gstrfuncs.c
@@ +3197,3 @@
+ *
+ * @base can be between 2 and 36 inclusive. Hexadecimal numbers must
+ * not be prefixed with "0x" or "0X". Such problem does not exist for

Nitpick: ‘Such a problem’

@@ +3208,3 @@
+ * See g_ascii_strtoll() if you have more complex needs such as
+ * parsing a string which starts with a number, but then has other
+ * characters; or if a number is not decimal.

Drop ‘or if a number is not decimal’.

Same comments for the other gtk-doc comments.

Comment 30 Krzesimir Nowak 2017-05-10 10:32:36 UTC

Created attachment 351524 [details] [review]
Adds convenience API for converting a string to a number, v6

Adding a patch that got commited. I also fixed one issue I spotted - two strings for errors were not marked for translation.

Comment 31 Piotr Drąg 2017-05-10 11:42:11 UTC

This string extracted to .po files doesn’t look correct:

#: ../glib/gstrfuncs.c:3268 ../glib/gstrfuncs.c:3367
msgid "Number “%"
msgstr ""

Comment 32 Krzesimir Nowak 2017-05-10 12:11:33 UTC

The code is:
g_set_error (error,
             G_NUMBER_PARSER_ERROR, G_NUMBER_PARSER_ERROR_OUT_OF_BOUNDS,
             _("Number “%" G_GUINT64_FORMAT "” is out of bounds"
               " [%" G_GUINT64_FORMAT ", %" G_GUINT64_FORMAT "]"),
             number, min, max);

I don't know what the fix should be. Fix the utility that extracts this stuff?

Comment 33 Philip Withnall 2017-05-10 12:42:13 UTC

(In reply to Krzesimir Nowak from comment #32)
> The code is:
> g_set_error (error,
>              G_NUMBER_PARSER_ERROR, G_NUMBER_PARSER_ERROR_OUT_OF_BOUNDS,
>              _("Number “%" G_GUINT64_FORMAT "” is out of bounds"
>                " [%" G_GUINT64_FORMAT ", %" G_GUINT64_FORMAT "]"),
>              number, min, max);
> 
> I don't know what the fix should be. Fix the utility that extracts this
> stuff?

I think the usual fix is to pre-format the G_GUINT64_FORMAT arguments into strings, then substitute them into the translatable string using %s placeholders, instead of %G_GUINT64_FORMAT.

Comment 34 Krzesimir Nowak 2017-05-10 13:54:47 UTC

Alright, will fix it then.

Comment 35 Krzesimir Nowak 2017-05-10 14:14:51 UTC

Created attachment 351553 [details] [review]
Fixes translation issues in string to number conversion functions

Comment 36 Philip Withnall 2017-05-10 14:48:14 UTC

Review of attachment 351553 [details] [review]:

++