Bug 547236 – printf don't accept nonlatin string in format string

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 547236 - printf don't accept nonlatin string in format string


Summary:	printf don't accept nonlatin string in format string


Status:	RESOLVED FIXED

Product:	vala
Classification:	Core
Component:	general
Version:	0.3.x
Hardware:	Other All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Jürg Billeter
QA Contact:	Vala maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-08-11 03:17 UTC by Alexey Lubimov
Modified:	2008-09-25 13:30 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
test case (360 bytes, text/plain) 2008-08-11 03:18 UTC, Alexey Lubimov		Details
test case 2 (562 bytes, text/plain) 2008-08-11 10:53 UTC, Alexey Lubimov		Details
test string.ndup() (349 bytes, text/plain) 2008-08-15 01:39 UTC, Alexey Lubimov		Details
patch to add function utf8_strncpy (942 bytes, patch) 2008-08-16 15:24 UTC, Alexey Lubimov	none	Details \| Review
utf8_ndup function and test case (689 bytes, text/plain) 2008-08-16 15:26 UTC, Alexey Lubimov		Details
Patch for utf-8 compatible function noquotes (385 bytes, patch) 2008-09-01 23:54 UTC, Alexey Lubimov	committed	Details \| Review

Description Alexey Lubimov 2008-08-11 03:17:05 UTC

printf don't accept  nonlatin  string as pattern:
>
> for example:
> stdout.printf("Успешно %s", "прошел");

 or

 "Успешно %s".printf("прошел");

 but successfully accept predefined string!

 for example:

 string pattern = "Успешно %s";
 pattern.printf("прошел");


 test case attached;

 valac -o string_bug  string_bug.vala

 string_bug.vala:11.5-11.41: error: Too many arguments for specified format
   stdout.printf("Успешно %s", "прошел");
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Compilation failed: 1 error(s), 0 warning(s)


 But with workaround:

 valac -o string_bug  string_bug.vala

 ./string_bug
 Успешно прошел

Comment 1 Alexey Lubimov 2008-08-11 03:18:19 UTC

Created attachment 116315 [details]
test case

Source to reproduce bug and workaround

Comment 2 xavier.bestel 2008-08-11 08:08:56 UTC

I think the problem is that C-strings are supposed to be 7-bits clean. UTF-8 is 8-bits.

Comment 3 Jürg Billeter 2008-08-11 08:16:31 UTC

It's a valac, not a gcc, error message and Vala source files are UTF-8 encoded, so it is a bug in valac.

Comment 4 Alexey Lubimov 2008-08-11 10:51:47 UTC

>I think the problem is that C-strings are supposed to be 7-bits clean. UTF-8 is
8-bits.

No, I'm think, this is another problem:

1) Bug raise only for pattern string. Parameters aloowed non-latin strings without any problem.

2) One non-latin simbol in pattern is allowed.  

I think, problem in diffrent length string for latin and non-latin simbols. 

Russian string have 2x length.


see test case 2.

Comment 5 Alexey Lubimov 2008-08-11 10:53:04 UTC

Created attachment 116329 [details]
test case 2

variant with allowed nonlatin string.

Comment 6 Juan Luis Paz 2008-08-12 02:58:49 UTC

Hello,

The valastringliteral.vala file, method eval, line 58 contains this sentence:

/* remove quotes */
var noquotes = value.offset (1).ndup ((uint) (value.len () - 2));

When  value = "\"foo\"", noquotes = "foo"
but if value = "\"Успешно %s\"", noquotes = "Успеш"

I did some tests:
the len result is ok
the offset result is ok
the ndup result is wrong

Maybe, ndup not support unicode format used (maybe offset too)

This bug affects some parts in the code generation process, not only the printf validation process

Comment 7 Alexey Lubimov 2008-08-15 01:39:30 UTC

Created attachment 116629 [details]
test string.ndup()

Comment 8 Alexey Lubimov 2008-08-15 01:41:29 UTC

g_strndup  copy _bytes_ from string, not characters! 

in glib-2.0.vapi:

<------>[CCode (cname = "g_strndup")]
<------>public string ndup (ulong n); /* FIXME: only UTF-8 */

but function  value.len () - 2 return number characters, not bytes!

Since russian char in UTF8 have 2 bytes length - g_strndup copy only half
string. See attachment - string_bug3


from glib documentation:

Note

To copy a number of characters from a UTF-8 encoded string, use
g_utf8_strncpy() instead.


But g_utf8_strncpy() not wrapped in glib-2.0.vapi. :(

Comment 9 Alexey Lubimov 2008-08-16 15:23:25 UTC

This is dirty workaround

1) patch to add g_utf8_strncpy to glib-2.0.vapi

2) function to replace string.ndup() - utf8_ndup.vala


Questions:

1) I want add g_utf8_strncpy to class string, but these function have no standart order for their parameters.

gchar*              g_utf8_strncpy   (gchar *dest,
                                      const gchar *src,
                                      gsize n);

I'm try attribute instance_pos, but these attribute have only two value - 0 and -1 and both values not suitable for these case.

instance_pos=0 place source string on 1'st. But I want reverse order for source and dest strings.

for example:
dest = source.utf8_ndup(dtring dest, long n)

unstance_pos=-1 will place instance to last position, but in this position must be long N...


Any ideas?

2) In diffrent this g_strndup, g_utf8_strncpy not automatic create result buffer

utf8_ndup = g_utf8_strncpy(g_strdup(source),source,n);


Can I  translate function "string.utf8_ndup" to  these C code in vapi file glib-2.0?

Or vala  can map One vala function to  One C-function only?

Comment 10 Alexey Lubimov 2008-08-16 15:24:55 UTC

Created attachment 116746 [details] [review]
patch to add function utf8_strncpy

Comment 11 Alexey Lubimov 2008-08-16 15:26:11 UTC

Created attachment 116747 [details]
utf8_ndup function and test case

Comment 12 rainwoodman 2008-08-22 06:41:11 UTC

See Bug 548897.
It can help you to solve the bug.

Comment 13 Alexey Lubimov 2008-09-01 23:52:41 UTC

Now, bug resolved.

Patch attached.

Many thanks Jürg Billeter. :)

Comment 14 Alexey Lubimov 2008-09-01 23:54:36 UTC

Created attachment 117815 [details] [review]
Patch for utf-8 compatible function noquotes

Comment 15 Jürg Billeter 2008-09-25 13:30:19 UTC

2008-09-25  Jürg Billeter  <j@bitron.ch>

	* vala/valastringliteral.vala:

	Fix processing of non-ASCII string literals,
	patch by Alexey Lubimov, fixes bug 547236

Fixed in r1781.