GNOME Bugzilla – Bug 547236
printf don't accept nonlatin string in format string
Last modified: 2008-09-25 13:30:19 UTC
printf don't accept nonlatin string as pattern: > > for example: > stdout.printf("Успешно %s", "прошел"); or "Успешно %s".printf("прошел"); but successfully accept predefined string! for example: string pattern = "Успешно %s"; pattern.printf("прошел"); test case attached; valac -o string_bug string_bug.vala string_bug.vala:11.5-11.41: error: Too many arguments for specified format stdout.printf("Успешно %s", "прошел"); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Compilation failed: 1 error(s), 0 warning(s) But with workaround: valac -o string_bug string_bug.vala ./string_bug Успешно прошел
Created attachment 116315 [details] test case Source to reproduce bug and workaround
I think the problem is that C-strings are supposed to be 7-bits clean. UTF-8 is 8-bits.
It's a valac, not a gcc, error message and Vala source files are UTF-8 encoded, so it is a bug in valac.
>I think the problem is that C-strings are supposed to be 7-bits clean. UTF-8 is 8-bits. No, I'm think, this is another problem: 1) Bug raise only for pattern string. Parameters aloowed non-latin strings without any problem. 2) One non-latin simbol in pattern is allowed. I think, problem in diffrent length string for latin and non-latin simbols. Russian string have 2x length. see test case 2.
Created attachment 116329 [details] test case 2 variant with allowed nonlatin string.
Hello, The valastringliteral.vala file, method eval, line 58 contains this sentence: /* remove quotes */ var noquotes = value.offset (1).ndup ((uint) (value.len () - 2)); When value = "\"foo\"", noquotes = "foo" but if value = "\"Успешно %s\"", noquotes = "Успеш" I did some tests: the len result is ok the offset result is ok the ndup result is wrong Maybe, ndup not support unicode format used (maybe offset too) This bug affects some parts in the code generation process, not only the printf validation process
Created attachment 116629 [details] test string.ndup()
g_strndup copy _bytes_ from string, not characters! in glib-2.0.vapi: <------>[CCode (cname = "g_strndup")] <------>public string ndup (ulong n); /* FIXME: only UTF-8 */ but function value.len () - 2 return number characters, not bytes! Since russian char in UTF8 have 2 bytes length - g_strndup copy only half string. See attachment - string_bug3 from glib documentation: Note To copy a number of characters from a UTF-8 encoded string, use g_utf8_strncpy() instead. But g_utf8_strncpy() not wrapped in glib-2.0.vapi. :(
This is dirty workaround 1) patch to add g_utf8_strncpy to glib-2.0.vapi 2) function to replace string.ndup() - utf8_ndup.vala Questions: 1) I want add g_utf8_strncpy to class string, but these function have no standart order for their parameters. gchar* g_utf8_strncpy (gchar *dest, const gchar *src, gsize n); I'm try attribute instance_pos, but these attribute have only two value - 0 and -1 and both values not suitable for these case. instance_pos=0 place source string on 1'st. But I want reverse order for source and dest strings. for example: dest = source.utf8_ndup(dtring dest, long n) unstance_pos=-1 will place instance to last position, but in this position must be long N... Any ideas? 2) In diffrent this g_strndup, g_utf8_strncpy not automatic create result buffer utf8_ndup = g_utf8_strncpy(g_strdup(source),source,n); Can I translate function "string.utf8_ndup" to these C code in vapi file glib-2.0? Or vala can map One vala function to One C-function only?
Created attachment 116746 [details] [review] patch to add function utf8_strncpy
Created attachment 116747 [details] utf8_ndup function and test case
See Bug 548897. It can help you to solve the bug.
Now, bug resolved. Patch attached. Many thanks Jürg Billeter. :)
Created attachment 117815 [details] [review] Patch for utf-8 compatible function noquotes
2008-09-25 Jürg Billeter <j@bitron.ch> * vala/valastringliteral.vala: Fix processing of non-ASCII string literals, patch by Alexey Lubimov, fixes bug 547236 Fixed in r1781.