GNOME Bugzilla – Bug 349765
Mistakes in display of the russian text in comments in UML diagrams.
Last modified: 2006-10-16 20:27:16 UTC
Please describe the problem: Russian text in comments to UML diagrams shows with mistakes. It is Ok in dialogue, in .dia file, but bad on screen and exported png. It is remarkable, that the similar bug is not noticed in class name. The wrong behaviour does not reproduce on use dia.exe instead of diaw.exe Steps to reproduce: 1. Run diaw.exe 2. Create diagram 3. Add UML class to diagram 4. Type in russain text in comment 5. Check in checkbox that control visibility of comments 6. Click ok Actual results: Fist characters of text will be skiped or replaced by '?' Expected results: I expect the text as I have entered it in dialogue. Does this happen every time? Yes Other information: Yes. I have .dia file, screenshot and two .png (from dia.exe and diaw.exe) But I don't know how to attach it to this request
Created attachment 70115 [details] diagram with sample
Created attachment 70116 [details] Screenshot with bug
Created attachment 70117 [details] export from diaw.exe
Created attachment 70118 [details] export from dia.exe
Alas, use of program dia.exe instead diaw.exe does not solve all problems with russian text. In large I see in stdout/stderr window lots messages like (dia.exe:1972): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()
Created attachment 70147 [details] new diagram where dia.exe failed
Appears to be a windows-only issue, I can open the diagrams just fine under Linux.
I'm also unable to confirm the bug (Windows XP). Could you give some details regarding your installation and operating system? Did you have previous dia installations? Are you using other applications ussing GTK+? Which GTK+ version do you use?
(In reply to comment #8) > I'm also unable to confirm the bug (Windows XP). > Could you give some details regarding your installation > and operating system? Microsoft Windows 2000 Professional 5.0.2195 Service Pack 4 сборка 2195 localised, russian > Did you have previous dia installations? No, it is first Dia instalation on my computer > Are you using other applications ussing GTK+? Yes, I also use gimp and inkskape, but inkskape use own copy of GTK+. Removing inkskape does not solve problem. > Which GTK+ version do you use? I don't know how to identify version of GTK+. I see GTK+-2.8.18-1 runtime environment in "add and remove program" applet of control panel (sorry, I forgot how it exact named in english version of windows). I see gtk+-2.8.18 in file C:\Program Files\Common Files\GTK\2.0\bin\libgtk-win32-2.0-0.dll Removing GTK+ and fresh instalation from "gtk+-2.8.18-setup-1.exe" does not solve problem. May be, you can point some important registry setting? I ready to help to debug dia, but I need some guides about compiler, libraries, sources and debugger. I have an experience, but not in this area - mingw/cygwin, GTK+ - "terra incognita".
(In reply to comment #8) > I'm also unable to confirm the bug (Windows XP). Just install GTK+ (gtk+-2.8.18-setup-1.exe) and dia (0.95.1) under Windows XP SP2 (localised, of course) - bug reproduced.
Could you try to enable "Install files for complex scripts and right-to-left languages"? You'll find this under the Windows language and regional settings in the Control Panel.
(In reply to comment #11) > Could you try to enable "Install files for complex scripts and right-to-left > languages"? > You'll find this under the Windows language and regional settings > in the Control Panel. Under W2K "language and regional settings" applet of Control Panel does not have required option. I dare to intend what is happen: 1) In dia for string representation used ghar* that contain data in UTF-8. 2) For folding comments in UML objects destined function dia-0.95/objects/UML/class.c/uml_create_documentation_tag 3) That used isspace(gchar) from <ctype.h> for searching boundaries of words. 4) It is erroneous for two reasons: most generaly: It is impossible to make decision by part character code in mulibyte character encoding less generaly: In other code samples used g_ascii_isspace (see sample: http://mail.gnome.org/archives/gtk-devel-list/2002-February/msg00124.html) because "Unlike the standard C library isspace() function, this only recognizes standard ASCII white-space and ignores the locale, returning FALSE for all non-ASCII characters" (see: http://antex.ru:8080/doc/gtk2/glib/glib-String-Utility-Functions.html#g-ascii-isspace). In opposition result of isspace undefined for non-ansi characters. have guessed right?
Hi, now I know how to build dia for myself. Replacement of isspace to g_ascii_isspace fully corrected the discussed bug. Unfortunately anonymous access to CSV does not allow me independently to correct source. Be kind, make change in objects/UML/class.c
538c538 < while( isspace(comment[CommentIndex])){ --- > while( g_ascii_isspace(comment[CommentIndex])){ 569c569 < isspace(comment[CommentIndex+LineLen])){ --- > g_ascii_isspace(comment[CommentIndex+LineLen])){ 589c589 < while( isspace(comment[CommentIndex])){ --- > while( g_ascii_isspace(comment[CommentIndex])){
given that a comment can be in any language the right fix should involve g_utf8_get_char () and g_unichar_isspace (). There is probably more wrong with the line breaking for utf-8 strings. All this strcat() and a-string-contains-of-single-chars stuff does not fit the utf-8 encdong the string is in. Apparently most users simply use english for their comment (or don't use comments at all;)) BTW: unified diffs (diff -u) is the prefered format for patches.
g_unichar_isspace() is not recommended for search a word boundaries. But because I don't know reasons lets use it. strcat() is valid tool for utf-8 encoded string, of course when copying whole utf-8 chars. a-string-contains-of-single-chars stuff also valid when using ASCII chars, such as '\n' and '}'.
Created attachment 71567 [details] [review] next version of wrapping of UML class comments
Hello, Week passed from the date of sending last patch. Unfortunately, I do not see it in CVS nor comments to this patch. May be cause in heavy load of developer(s)/maintainer(s), but it undermines belief in power of Open Source. How can I obtain the write access to CVS?
Comment on attachment 71567 [details] [review] next version of wrapping of UML class comments To answer your last question first: by providing quality patches which convince the maintainers that it would be easier to let you apply them directly. Naad mabye some stronger belief in the powers of open source ;) BTW: our patch does not apply cleanly to cvs. hb@hb-athlon UML $ patch --dry-run < class.diff patching file class.c Hunk #1 FAILED at 30. Hunk #2 FAILED at 516. 2 out of 2 hunks FAILED -- saving rejects to file class.c.rej
(In reply to comment #19) > hb@hb-athlon UML $ patch --dry-run < class.diff > patching file class.c > Hunk #1 FAILED at 30. > Hunk #2 FAILED at 516. > 2 out of 2 hunks FAILED -- saving rejects to file class.c.rej Please describe reqirements to .diff file. I produce it from wincvs, checking "unified diff" without other options. Prompt as it to do correctly.
Created attachment 74793 [details] [review] next try - now patch --dry-run <class.diff work correctry "The devil is in the details" - differences in spaces and some software does not keep spaces/tabs. :(
Thanks, applied. 2006-10-16 Hans Breuer <hans@breuer.org> * objects/UML/class.c : the comment wrapping was only working for plain ASCII, now it deals with UTF-8 (Vadim Zelenin, bug #349765)