After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 791506 - valac unconditionally uses ints for all array length fields
valac unconditionally uses ints for all array length fields
Status: RESOLVED OBSOLETE
Product: vala
Classification: Core
Component: Arrays
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Vala maintainers
Vala maintainers
Depends on:
Blocks: 796313
 
 
Reported: 2017-12-12 03:44 UTC by bob
Modified: 2018-05-22 15:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
0001-codegen-unconditionally-use-gsize-for-array-sizes.patch (1.83 KB, patch)
2017-12-12 03:45 UTC, bob
rejected Details | Review
0002-vala-arraytype-respect-CCode.array_length_type.patch (4.04 KB, patch)
2017-12-12 03:51 UTC, bob
needs-work Details | Review
0003-codegen-replace-fixed-int-array-length-types.patch (10.30 KB, patch)
2017-12-12 03:53 UTC, bob
needs-work Details | Review
0004-codegen-infer-correct-index-type-in-array-foreach.patch (1.20 KB, patch)
2017-12-12 03:54 UTC, bob
none Details | Review
0005-tests-verify-CCode.array_length_type-behaviour.patch (1.23 KB, patch)
2017-12-12 03:56 UTC, bob
needs-work Details | Review

Description bob 2017-12-12 03:44:14 UTC
For arrays defined in Vala source, valac will always generate C code using `int' as the type for the length and _size_ fields. I have some (probably controversial) patches, but I just want to describe where I'm coming from with this:

I assume a signed type was chosen as a good default for the length field so invalid values like -1 can have an out-of-band representation, but I would have thought ssize_t would be a better choice. It might have to remain the default for the sake of API and ABI compatibility, which is perfectly acceptable if the user has means to choose a different type.

On the other hand, using `int' for _size_ seems like a really terrible idea:

 * Being a piece of book-keeping handled statically by the compiler, there shouldn't be any use for invalid values.
 * It introduces a hard-limit for any array operations Vala does, at a measly 2**15 elements in the worst standard-compliant case (of course, the common x86 environments bump this up to 2**31, but we really ought not to be relying on this sort of implementation-defined behaviour).
 * This implementation detail is completely hidden from the user, rendering them unable to correct this behaviour without ugly sed scripts and the like.

I would argue that any type other than size_t for this field creates fundamentally broken code, but there might be a place for it despite whatever I might claim ;)
In this case, I would think it ought to be a user-configurable thing a la CCode.array_length_type.


So, with that out of the way, the basic plan my patches implement is this:

 * Update Vala.ArrayType to honor CCode.array_length_type and introducing a method for this type to be queried by codegen.
 * Update the relevant codegen modules to use an ArrayType's configured length field type and unconditionally use size_t for array _size_ fields.
 * Use an ArrayType's configured length field type as the type for indexes in foreach statements operating on arrays.

A couple of things might be worth looking at that I haven't addressed:

 * Altering the .move() and .resize() method signatures to respect the configured field types. It seems as though valac generates one of these functions per source file that get shared amongst all the calls to these methods for various different array types. I wasn't sure generating several instances of the same function with different signatures was a good thing to do, so I just left this alone.

 * Trying to correctly infer the type of the index in a statement like `for (var i = 0; i < array.length; i++)'. I'm not sure how sophisticated valac's (or any compiler's) type inferencing is, but this seems like something that would become a jungle of corner-cases. Can this be done?

 * I haven't touched the DBus codegen module since I wasn't sure if there were DBus-specific knock-on effects.
Comment 1 bob 2017-12-12 03:45:39 UTC
Created attachment 365406 [details] [review]
0001-codegen-unconditionally-use-gsize-for-array-sizes.patch

I would think this patch is the most critical, but it also may break ABIs.
Comment 2 bob 2017-12-12 03:51:47 UTC
Created attachment 365407 [details] [review]
0002-vala-arraytype-respect-CCode.array_length_type.patch

A couple of things about this patch I'm unsure about:

 * Should get_length_field_type() be a property instead?
 * Is it okay to hijack CCode.array_length_type like this? The type gets looked up in the Vala namespace, which is incongruous to the attribute's behaviour when used in VAPIs.
 * The get_length_field_type() call in ArrayType.accept_children() seems like a hack, but I wasn't sure when attributes get applied to a CodeNode so it seemed like the safest place.
 * Should the compatibility test be relaxed somehow, maybe make it a warning instead of a fatal error?
Comment 3 bob 2017-12-12 03:53:45 UTC
Created attachment 365408 [details] [review]
0003-codegen-replace-fixed-int-array-length-types.patch

I suspect there's bits I missed in here, I only found these with grep. The test suite still passes, but some further testing might be required.
Comment 4 bob 2017-12-12 03:54:55 UTC
Created attachment 365409 [details] [review]
0004-codegen-infer-correct-index-type-in-array-foreach.patch

Not sure if this should be a separate patch, but here it is
Comment 5 bob 2017-12-12 03:56:49 UTC
Created attachment 365410 [details] [review]
0005-tests-verify-CCode.array_length_type-behaviour.patch

I'm super iffy about this one, since the test relies on undefined behaviour (integer overflow)
Comment 6 Jürg Billeter 2017-12-12 05:25:48 UTC
Thanks for the patches. That's indeed a controversial topic.

Regarding signed vs. unsigned: Even if there was no need for the special value -1, I would still have chosen a signed integer type for the array length. The reason is that arithmetic operations are frequently used to calculate an array index and these operations often also involve signed integers and the array length.

Mixing signed and unsigned integers in arithmetic or comparisons is generally a terrible idea, especially in C where there is implicit promotion to unsigned without overflow checking. I suspect that almost no developer would always properly handle the full unsigned range. The reduced bug risk of only using signed integers is well worth the 1 bit of range, in my opinion. Type systems that do not allow silent conversions are slightly better but even there signed integers make more sense. And unfortunately, disallowing this in Vala was impractical due to its C heritage.

int vs. ssize_t: GLib requires int to be 32-bit, so for better or worse Vala does not support systems that use 16-bit int. I expect int to be the most commonly used array length type in C APIs, at least outside byte buffers. Given the reliance of Vala on C APIs, it still seems a sensible default. The array length type matching the default int type of the language is also very useful, again due to array indices often being involved in arithmetic operations.

While using an unsigned integer for the _size_ field would be less of an issue than for the length field, I don't really see a point in having different types for the two fields as the extra bit(s) couldn't be used if the length field was still limited.

The above is my reasoning why int makes sense as default. Allowing the user to override the default type may make sense, though. I would always keep the types for the length field and the _size_ field in sync and I would still avoid unsigned types but ssize_t or int64 may be useful in some cases. However, as this changes the Vala type of the array length, it wouldn't just be a change in generated C code and thus, we shouldn't use a CCode attribute for this.
Comment 7 bob 2017-12-12 06:51:30 UTC
(In reply to Jürg Billeter from comment #6)
> Thanks for the patches. That's indeed a controversial topic.

FWIW, just being being able to set the type would solve my frustration. My discussion below is purely academic.

> Regarding signed vs. unsigned: Even if there was no need for the special
> value -1, I would still have chosen a signed integer type for the array
> length. The reason is that arithmetic operations are frequently used to
> calculate an array index and these operations often also involve signed
> integers and the array length.
> 
> Mixing signed and unsigned integers in arithmetic or comparisons is
> generally a terrible idea, especially in C where there is implicit promotion
> to unsigned without overflow checking. I suspect that almost no developer
> would always properly handle the full unsigned range. The reduced bug risk
> of only using signed integers is well worth the 1 bit of range, in my
> opinion. Type systems that do not allow silent conversions are slightly
> better but even there signed integers make more sense. And unfortunately,
> disallowing this in Vala was impractical due to its C heritage.

Okay, I see your point.

> int vs. ssize_t: GLib requires int to be 32-bit, so for better or worse Vala
> does not support systems that use 16-bit int.

As an aside, could you provide a source for that? I'd be interested in knowing more, but I'm not finding much.

> I expect int to be the most
> commonly used array length type in C APIs, at least outside byte buffers.
> Given the reliance of Vala on C APIs, it still seems a sensible default. The
> array length type matching the default int type of the language is also very
> useful, again due to array indices often being involved in arithmetic
> operations.

As far as a default type for the length field, I'm not terribly bothered about the use of int.

> While using an unsigned integer for the _size_ field would be less of an
> issue than for the length field, I don't really see a point in having
> different types for the two fields as the extra bit(s) couldn't be used if
> the length field was still limited.

I have a pathological and somewhat unlikely scenario in mind:

  int[] some_array = {};
  int32 some_large_number = 1 << 30;
  some_array.resize (some_large_number + 1);
  some_array[some_large_number] = 1234;

The _size_ field will overflow when the resize function tries to allocate 2*n entries, and the array assignment will attempt to write a value someplace far beyond our allotted address space.

I'd also reiterate that the current setup doesn't assign negative values (or shouldn't, at least). I'm not about to argue that I legitimately require the extra sign bit of a 64-bit ssize_t, but I'm not sure why you would reserve a sign bit that's never going to be used when you could get some pretty extreme future-proofing for free ;)

On the other hand, I do on occasion require the extra bit in a 32-bit int index.

> The above is my reasoning why int makes sense as default. Allowing the user
> to override the default type may make sense, though. I would always keep the
> types for the length field and the _size_ field in sync and I would still
> avoid unsigned types but ssize_t or int64 may be useful in some cases.

My gripe is only really that the current system is inconfigurable(?), so I'm not especially concerned with the rest. And whilst I'd still argue that a size_t _size_ is The Right Thing, I agree it mightn't be so bad if that were configurable too.

> However, as this changes the Vala type of the array length, it wouldn't just
> be a change in generated C code and thus, we shouldn't use a CCode attribute
> for this.

Yeah, my thoughts too. I went that route since minimises the scope of my patches; it seemed a bit much for a (very) small-time contributor as myself to be adding in new attributes and the like. Since only the first patch changes any defaults, I'd be happy to scratch that and update the remaining ones if you have suggestions for what said attribute(s) would look like (although you might feel it easier to do it yourself ;)

Thanks!
Comment 8 Jürg Billeter 2017-12-16 09:20:06 UTC
(In reply to bob from comment #7)
> (In reply to Jürg Billeter from comment #6)
> > int vs. ssize_t: GLib requires int to be 32-bit, so for better or worse Vala
> > does not support systems that use 16-bit int.
> 
> As an aside, could you provide a source for that? I'd be interested in
> knowing more, but I'm not finding much.

https://git.gnome.org/browse/glib/tree/glib/glib-init.c#n109

G_STATIC_ASSERT (sizeof (int) == sizeof (gint32));

> > However, as this changes the Vala type of the array length, it wouldn't just
> > be a change in generated C code and thus, we shouldn't use a CCode attribute
> > for this.
> 
> Yeah, my thoughts too. I went that route since minimises the scope of my
> patches; it seemed a bit much for a (very) small-time contributor as myself
> to be adding in new attributes and the like. Since only the first patch
> changes any defaults, I'd be happy to scratch that and update the remaining
> ones if you have suggestions for what said attribute(s) would look like
> (although you might feel it easier to do it yourself ;)

I don't have anything specific in mind at this point. As far as I remember, we don't currently have any attributes that affect a Vala type in a similar way. One option would be something like

    [Array(length_type = "...")]

I don't particularly like specifying Vala types using string literals. We may be able to add support for attributes that take identifiers/types as values. Not sure whether that would be a good idea.

A less flexible but simpler alternative would be to define a boolean attribute that would decide between int (default) and ssize_t. This should affect both fields: length and _size_.
Comment 9 Rico Tzschichholz 2017-12-16 11:19:07 UTC
Comment on attachment 365407 [details] [review]
0002-vala-arraytype-respect-CCode.array_length_type.patch

The CCode(array_length_type) attribute is referencing the actual ctype and not a vala symbol reference. This is currently a bit confusing while in several places "int" is used instead of "gint" which would make it more clear.
Comment 10 GNOME Infrastructure Team 2018-05-22 15:58:07 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/vala/issues/607.