Bug 571000 – add ByteArray

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 571000 - add ByteArray


Summary:	add ByteArray


Status:	RESOLVED FIXED

Product:	gjs
Classification:	Bindings
Component:	general
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gjs-maint
QA Contact:	gjs-maint

URL:
Whiteboard:

Depends on:	622071
Blocks:

Reported:	2009-02-08 22:41 UTC by Havoc Pennington
Modified:	2010-06-23 04:07 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
byte array patch (42.76 KB, patch) 2010-01-03 18:31 UTC, Havoc Pennington	none	Details \| Review
v2 (45.75 KB, patch) 2010-01-04 14:53 UTC, Havoc Pennington	reviewed	Details \| Review
Add a byteArray.ByteArray class. (3.10 KB, patch) 2010-06-18 18:31 UTC, Johan (not receiving bugmail) Dahlin	needs-work	Details \| Review
Add a byteArray.ByteArray class. (45.94 KB, patch) 2010-06-18 19:00 UTC, Johan (not receiving bugmail) Dahlin	none	Details \| Review
[importer] Allow native modules in gjs.so (4.35 KB, patch) 2010-06-22 00:29 UTC, Johan (not receiving bugmail) Dahlin	none	Details \| Review
[gjs] Add a byteArray module. (47.10 KB, patch) 2010-06-22 00:29 UTC, Johan (not receiving bugmail) Dahlin	committed	Details \| Review

Description Havoc Pennington 2009-02-08 22:41:32 UTC

Certain APIs require a byte array type to use. There's a hacky way to store byte arrays in JavaScript strings, but it has a lot of problems; not least, strings are immutable.

We could pretty easily define a custom JSClass in a gjs/gjs/bytearray.c which would globally define a ByteArray type, something like this:
http://wiki.ecmascript.org/doku.php?id=proposals:bytearray

Best I can tell from Google, this was never even final for ECMAScript 4, and of course ES4 was canceled, so it isn't really a standard or anything. But the basic design there looks sensible; we don't need anything complex.

Summary of that page in case it vanishes:

* new ByteArray(optional length)
* ByteArray.toString() converts to string (they are saying there assuming latin 1, I would vote for assuming byte array is utf8, or adding optional encoding arg to toString(), or something)
* ByteArray.fromString() they are saying assign each char in string to 1 byte, so mangling everything above latin 1, I might say add optional encoding arg or assume utf8
* implement "byteArray[0] = 1" for ByteArray (get/set properties sets the byte)
* ByteArray has a "length" property
* ByteArray.fromArray() converts an integer array to byte array

Comment 1 Havoc Pennington 2009-02-08 22:44:11 UTC

One question, a common use of this might be as a buffer for read() type functions; in that case, it's most efficient if "new ByteArray(256)" creates a buffer with undefined contents. But it might be more javascripty to init the byte array to 0.

Comment 2 Johan (not receiving bugmail) Dahlin 2009-02-08 22:52:07 UTC

This makes a lot of sense. 
It would be best if new ByteArray(n) would be a O(1) operation, similar to what the Array does. undefined seems wrong as it's not a valid byte that can be stored, so defaulting to 0 is preferred.

Would byteArray[0] return a string or integer? How would you get the other?

Comment 3 Havoc Pennington 2009-02-08 23:03:13 UTC

byteArray[0] would return an integer; to get a string I think you would do either
fromCharCode:
https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/String/fromCharCode

or else something like byteArray.slice(0,1).toString() (makes sense only if you want a big chunk of the byte array converted to a string, and implies that we'd have to implement slice and other array methods on ByteArray, I guess)

Comment 4 C. Scott Ananian 2009-11-20 22:32:52 UTC

See also bug 581687 and the bugs linked to it, which add support for marshalling the GByteArray class as strings; these should marshall as ByteArray once that's added to gjs.

Comment 5 Colin Walters 2009-11-20 22:42:55 UTC

See also http://wiki.commonjs.org/wiki/Binary

Comment 6 Havoc Pennington 2010-01-03 18:31:40 UTC

Created attachment 150747 [details] [review]
byte array patch

Comment 7 Havoc Pennington 2010-01-03 18:38:58 UTC

Here is a first cut. I went with the simpler ByteArray from ECMA4, rather than the more complex ideas at commonjs. I think our goals are mostly to shovel bytes say from GIO read operation, to the pixbuf parser, things like that. There are other "would be nice" things, but that's the really useful thing for gjs, is to just have a blob of bytes thrown around between C APIs. As such, not sure ByteArray needs much to it. The idea of the ECMA proposal is basically to look like regular Array, but contain bytes as integers.

Some questions and todo items:

* this is done as a module, not a built-in like Array. The gi/* code will need access to some C API to create and manipulate ByteArray. e.g. to get a raw data pointer to use as a buffer. I don't know where the heck we will put this... a simple approach would be to build ByteArray into gjs, instead of having it as an optional module. Possibly the best approach.

* for each and for...in are enormously inefficient, because spidermonkey wants us to JS_DefineProperty() for each value. I'm not sure but I think the "getter" might also end up defining the property. Anyway we end up copying each byte into
hash table entries, if people look at bytes from JS. Pretty horrible. 
Then again, as mentioned, I don't think looking at bytes from JS is the main use-case for now.

* I think fromArray() (and likely a toArray() and toSource()) should be implemented.

* the array ops such as slice() are not implemented. They probably should be eventually, but I don't know if we need to block on it.

* no test cases for toString() / fromString() ; should add at least minimal ones. Somewhat blocks on toArray() which would let us more easily put in the data for weirdly-encoded strings in the source code.

Comment 8 Havoc Pennington 2010-01-04 03:10:36 UTC

* I forgot and left in some excessive debug logging that traces every function, that should come out

Comment 9 Havoc Pennington 2010-01-04 03:49:31 UTC

* fromString and fromArray are supposed to be "static" functions instead of methods

Comment 10 Havoc Pennington 2010-01-04 14:53:01 UTC

Created attachment 150779 [details] [review]
v2

This one is a bit nicer:
* no debug spew
* fromArray / fromString are module functions, not methods
* no extra debug gunk
* couple more test cases

Comment 11 Havoc Pennington 2010-01-04 14:53:40 UTC

v2 also implements fromArray

Comment 12 Johan (not receiving bugmail) Dahlin 2010-03-01 22:14:35 UTC

Review of attachment 150779 [details] [review]:

Looks pretty good. Misses introspection integration though, but perhaps that should go into a separate bug. Dunno.
Third-party modules such as cairo should be allowed to use this, so we'll need a public API. Or maybe we should
just let image libraries use gdk-pixbuf via introspection.

::: doc/gjs-byte-array.txt
@@ +44,3 @@
+
+<code>
+byteArrayObject = new ByteArray(byteArrayLength:uint)

Avoid ActionScript syntax here?

::: modules/byteArray.c
@@ +22,3 @@
+ */
+
+#include <config.h>

Weird header grouping

@@ +102,3 @@
+                     jsval             *value_p)
+{
+    if (v > (gsize) JSVAL_INT_MAX) {

Shouldn't this be inverted, eg < instead of > ?

@@ +149,3 @@
+
+    if (v >= 256) {
+        gjs_throw(context,

Shouldn't the > 255 check be done close to the < 0 check? Eg close as in the same function?

@@ +166,3 @@
+                     jsval             *value_p)
+{
+    int v;

guint8 v; to be consistent with the GByteArray struct.

@@ +176,3 @@
+    }
+
+    v = priv->array->data[idx];

g_array_index() perhaps

@@ +275,3 @@
+
+    /* grow the array if necessary */
+    if (idx >= priv->array->len) {

This check is already done internally in g_byte_array_set_size()

@@ +280,3 @@
+    }
+
+    priv->array->data[idx] = v;

g_array_index(priv->array) = v ?

@@ +504,3 @@
+    JSClass *obj_class;
+    JSClass *proto_class;
+    guint32 preallocated_length;

should we not have a reasonable default preallocated_length? Or just inherit the glib default (which is 16)

@@ +622,3 @@
+                                    retval);
+    } else {
+        JSBool ok;

This should probably be refactored to a public method so it can be used in third-party code.

@@ +771,3 @@
+        g_free(utf8);
+    } else {
+        char *encoded;

Ditto, should be public API.

::: test/js/testByteArray.js
@@ +3,3 @@
+function testEmptyByteArray() {
+const ByteArray = imports.byteArray;
+// tests for imports.lang module

You should probably test more a few error conditions here, such as negative numbers

Comment 13 Dan Winship 2010-04-13 20:08:25 UTC

(In reply to comment #7)
> Here is a first cut. I went with the simpler ByteArray from ECMA4, rather than
> the more complex ideas at commonjs.

http://wiki.commonjs.org/wiki/CommonJS#Implementations suggests that people are going with http://wiki.commonjs.org/wiki/Binary/B, which is not much different from being a superset of your ByteArray.

But just having *something* here is better than having something compatible.

Comment 14 Havoc Pennington 2010-04-13 20:36:04 UTC

could certainly rename some stuff to be more like Binary/B, though note their approach uses Array.toFoo and String.toFoo monkeypatches instead of having ByteArray.fromFoo. Not sure I love that. Also of course they implement the whole pile of array/string operations that I skipped and don't think are especially useful for the use-cases I've seen in gjs. But we could be a subset of Binary/B ignoring those two issues, just rearrange the patch here a bit.

For me the two real open questions are mentioned in comment 7. 
1. How does gi/* get to ByteArray from C
2. for...in and probably [] are outrageously broken - basically you can't look at bytes from JS sanely

1. is probably relatively easy just need to pick something 
2. probably needs punting, but maybe it should be punted in a way where trying to do the broken thing throws an exception for now?

Comment 15 Johan (not receiving bugmail) Dahlin 2010-04-14 12:22:58 UTC

(In reply to comment #14)
> could certainly rename some stuff to be more like Binary/B, though note their
> approach uses Array.toFoo and String.toFoo monkeypatches instead of having
> ByteArray.fromFoo. Not sure I love that. Also of course they implement the
> whole pile of array/string operations that I skipped and don't think are
> especially useful for the use-cases I've seen in gjs. But we could be a subset
> of Binary/B ignoring those two issues, just rearrange the patch here a bit.
> 
> For me the two real open questions are mentioned in comment 7. 
> 1. How does gi/* get to ByteArray from C
> 2. for...in and probably [] are outrageously broken - basically you can't look
> at bytes from JS sanely
> 
> 1. is probably relatively easy just need to pick something 
> 2. probably needs punting, but maybe it should be punted in a way where trying
> to do the broken thing throws an exception for now?

For 1 we can probably just use something similar to what I did for the cairo bindings, just define two public functions:
 byte_array_from_data(context, data)
 byte_array_get_data(context, byte_array)

I thought this patch already did the GI integration, it should probably be done before we land this as it's not very useful without it.

2. Unsure how much work it is, but sure raising an exception is fine, requiring users to do for(i; i < ba.length; ++i) { ba[i]; } or just converting to a string is fine for now.

Comment 16 Havoc Pennington 2010-06-08 03:01:52 UTC

on 2, that loop won't work either because it does ba[i]

I think the only solution for 2 is to not implement [i] or for...in. Either just omit them, or add "int getByte(index)" method instead. Omitting them is fine for all the uses we really have right now for this class - it's just a way to shovel bytes between C APIs, it's kind of crazy to do byte-by-byte parsing in JS anyhow.

Comment 17 Johan (not receiving bugmail) Dahlin 2010-06-18 18:31:06 UTC

Created attachment 164032 [details] [review]
Add a byteArray.ByteArray class.

Unmodified patch re-applied against head.

Comment 18 Colin Walters 2010-06-18 18:58:53 UTC

Review of attachment 164032 [details] [review]:

This looks like it's missing the new files.

Comment 19 Johan (not receiving bugmail) Dahlin 2010-06-18 19:00:14 UTC

Created attachment 164033 [details] [review]
Add a byteArray.ByteArray class.

Updated patch, with new files included.

Comment 20 Havoc Pennington 2010-06-18 19:02:45 UTC

you guys aren't going to push that unmodified are you? should incorporate the earlier review comments, and drop [] and for...in, at least.

Comment 21 Johan (not receiving bugmail) Dahlin 2010-06-18 19:05:12 UTC

No, just pushing the updated code. Still trying to figure out how write gi support for this, either 
- a) making gjs-gi.so use symbols from bytearray.so lazily
- b) moving bytearray into gjs-gi.so

Comment 22 Johan (not receiving bugmail) Dahlin 2010-06-22 00:29:32 UTC

Created attachment 164264 [details] [review]
[importer] Allow native modules in gjs.so

Allow native modules to be registered inside gjs.so.
This will be used by a few modules like sys (needs argv) and
byteArray (exports internal api to gi)

https://bugzilla.gnome.org/show_bug.cgi?id=622071

Comment 23 Johan (not receiving bugmail) Dahlin 2010-06-22 00:29:42 UTC

Created attachment 164265 [details] [review]
[gjs] Add a byteArray module.

docs/gjs-byte-array.txt explains how to use it and its features.

The primary purpose of this for now will be to bind C APIs that use
byte buffers, such as read and write operations. As such, the
ByteArray class is pretty simple and is mostly just a blob you can
shovel between C APIs.

In the initial patch, ByteArray does not even support array operations
such as slice().

The enumeration support relies on defining each byte as a property
on the byte array object, which makes enumerating bytes in the array
even more inefficient than it already would be.

Comment 24 Johan (not receiving bugmail) Dahlin 2010-06-22 00:31:18 UTC

Comment on attachment 164264 [details] [review]
[importer] Allow native modules in gjs.so

Filed as a separate bug.

Comment 25 Lucas Rocha 2010-06-22 17:55:09 UTC

Review of attachment 164265 [details] [review]:

Looks good to me. Considering that accessing/setting indexes is highly inefficient, shouldn't we just drop those?

::: gjs/byteArray.c
@@ +576,3 @@
+{
+    JSObject *array;
+    ByteArrayInstance *priv;

Blank line here, breathing is good :-)

@@ +577,3 @@
+    JSObject *array;
+    ByteArrayInstance *priv;
+    array = JS_NewObject(context, &gjs_byte_array_class, gjs_byte_array_prototype, NULL);

Blank line here, breathing is good :-)

@@ +580,3 @@
+    priv = g_slice_new0(ByteArrayInstance);
+    priv->array = gjs_g_byte_array_new(0);
+    g_assert(priv_from_js(context, array) == NULL);

Blank line here. Anyway, you got the idea :-P

@@ +769,3 @@
+        JS_EvaluateScript(context, JS_GetGlobalObject(context),
+                          "imports.byteArray.ByteArray;", 27,
+                          "<internal>", 1, &rval);

This is to dynamically declare the prototype?

@@ +797,3 @@
+    priv = priv_from_js(context, object);
+    if (priv == NULL)
+        return JS_FALSE; /* wrong class passed in */

Returning JS_FALSE on a function that returns GByteArray*?

::: gjs/byteArray.h
@@ +33,3 @@
+
+JSBool        gjs_define_byte_array_stuff     (JSContext      *context,
+                                               JSObject       *in_object);

Align with the rest of the functions?

Comment 26 Lucas Rocha 2010-06-22 23:43:40 UTC

Wondering: this patch does not seem to cover the case of g_file_replace_contents for example. I guess it should cover this case?

Comment 27 Johan (not receiving bugmail) Dahlin 2010-06-23 03:07:12 UTC

(In reply to comment #26)
> Wondering: this patch does not seem to cover the case of
> g_file_replace_contents for example. I guess it should cover this case?

No, g_file_replace_contents() is not covered by this patch. We need to decide the exact rules for when we'll return a byteArray first.

Comment 28 Johan (not receiving bugmail) Dahlin 2010-06-23 03:09:47 UTC

Attachment 164265 [details] pushed as 25dc86f - [gjs] Add a byteArray module.

Comment 29 Havoc Pennington 2010-06-23 03:24:56 UTC

I thought we were dropping [] and for...in 

Using these is Bad(tm) - think putting every pixel of a pixbuf in a hash table.

I guess we'd have to drop from test suite and docs too so it's sort of a pain. we could just add a gjs_throw() if you try to use them for now, put a note in docs, comment out tests?

Comment 30 Johan (not receiving bugmail) Dahlin 2010-06-23 03:36:56 UTC

(In reply to comment #29)
> I thought we were dropping [] and for...in 

for..in was dropped, but I kept [] around, I actually want to be able to use them for cairo surface/gdkpixbufs.

> Using these is Bad(tm) - think putting every pixel of a pixbuf in a hash table.

Horribly slow yes, but speed is not always important. I think it's still nice to keep them around.
Main use case is to be able to write mandelbrot in gjs or so ;-)

> I guess we'd have to drop from test suite and docs too so it's sort of a pain.
> we could just add a gjs_throw() if you try to use them for now, put a note in
> docs, comment out tests?

Yeah, docs needs to be updated

Comment 31 Havoc Pennington 2010-06-23 04:07:21 UTC

Well not just horribly slow but horribly bloated; each byte becomes at least a 32-bit key and 32-bit value plus whatever spidermonkey's hash bucket overhead is.
Your pixbuf will be using at least 8 times the memory after iterating the pixels.

As long as it's documented that it's a terrible idea to use ...