Bug 349765 – Mistakes in display of the russian text in comments in UML diagrams.

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 349765 - Mistakes in display of the russian text in comments in UML diagrams.


Summary:	Mistakes in display of the russian text in comments in UML diagrams.


Status:	RESOLVED FIXED

Product:	dia
Classification:	Other
Component:	objects
Version:	0.95.1
Hardware:	Other All

Importance:	Normal major
Target Milestone:	0.96
Assigned To:	Dia maintainers
QA Contact:	Dia maintainers

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2006-08-03 08:56 UTC by Vadim Zelenin
Modified:	2006-10-16 20:27 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
diagram with sample (24.20 KB, text/xml) 2006-08-03 08:58 UTC, Vadim Zelenin		Details
Screenshot with bug (10.55 KB, image/x-png) 2006-08-03 08:59 UTC, Vadim Zelenin		Details
export from diaw.exe (8.57 KB, image/x-png) 2006-08-03 09:00 UTC, Vadim Zelenin		Details
export from dia.exe (8.12 KB, image/x-png) 2006-08-03 09:01 UTC, Vadim Zelenin		Details
new diagram where dia.exe failed (7.73 KB, text/xml) 2006-08-03 17:26 UTC, Vadim Zelenin		Details
next version of wrapping of UML class comments (4.87 KB, patch) 2006-08-25 07:48 UTC, Vadim Zelenin	needs-work	Details \| Review
next try - now patch --dry-run <class.diff work correctry (4.85 KB, patch) 2006-10-16 12:03 UTC, Vadim Zelenin	committed	Details \| Review

Description Vadim Zelenin 2006-08-03 08:56:24 UTC

Please describe the problem:
Russian text in comments to UML diagrams shows with mistakes.
It is Ok in dialogue, in .dia file, but bad on screen and exported png.
It is remarkable, that the similar bug is not noticed in class name.
The wrong behaviour does not reproduce on use dia.exe instead of diaw.exe

Steps to reproduce:
1. Run diaw.exe
2. Create diagram
3. Add UML class to diagram
4. Type in russain text in comment
5. Check in checkbox that control visibility of comments
6. Click ok


Actual results:
Fist characters of text will be skiped or replaced by '?'

Expected results:
I expect the text as I have entered it in dialogue. 

Does this happen every time?
Yes

Other information:
Yes. I have .dia file, screenshot and two .png (from dia.exe and diaw.exe)
But I don't know how to attach it to this request

Comment 1 Vadim Zelenin 2006-08-03 08:58:34 UTC

Created attachment 70115 [details]
diagram with sample

Comment 2 Vadim Zelenin 2006-08-03 08:59:34 UTC

Created attachment 70116 [details]
Screenshot with bug

Comment 3 Vadim Zelenin 2006-08-03 09:00:52 UTC

Created attachment 70117 [details]
export from diaw.exe

Comment 4 Vadim Zelenin 2006-08-03 09:01:39 UTC

Created attachment 70118 [details]
export from dia.exe

Comment 5 Vadim Zelenin 2006-08-03 17:21:03 UTC

Alas, use of program dia.exe instead diaw.exe does not solve all problems with russian text. In large 
I see in stdout/stderr window lots messages like
(dia.exe:1972): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()

Comment 6 Vadim Zelenin 2006-08-03 17:26:29 UTC

Created attachment 70147 [details]
new diagram where dia.exe failed

Comment 7 Lars Clausen 2006-08-03 18:05:38 UTC

Appears to be a windows-only issue, I can open the diagrams just fine under Linux.

Comment 8 Steffen Macke 2006-08-04 05:20:28 UTC

I'm also unable to confirm the bug (Windows XP).

Could you give some details regarding your installation
and operating system?

Did you have previous dia installations?

Are you using other applications ussing GTK+?

Which GTK+ version do you use?

Comment 9 Vadim Zelenin 2006-08-04 08:19:01 UTC

(In reply to comment #8)
> I'm also unable to confirm the bug (Windows XP).
> Could you give some details regarding your installation
> and operating system?

Microsoft Windows 2000 Professional
5.0.2195 Service Pack 4 сборка 2195
localised, russian

> Did you have previous dia installations?
No, it is first Dia instalation on my computer

> Are you using other applications ussing GTK+?
Yes, I also use gimp and inkskape, 
but inkskape use own copy of GTK+.
Removing inkskape does not solve problem.

> Which GTK+ version do you use?
I don't know how to identify version of GTK+.
I see GTK+-2.8.18-1 runtime environment in "add and remove program" applet of control panel (sorry, I forgot how it exact named in english version of windows). 
I see gtk+-2.8.18 in file C:\Program Files\Common Files\GTK\2.0\bin\libgtk-win32-2.0-0.dll
Removing GTK+ and fresh instalation from "gtk+-2.8.18-setup-1.exe" does not solve problem.

May be, you can point some important registry setting?

I ready to help to debug dia,
but I need some guides about compiler, libraries, sources and debugger.
I have an experience, but not in this area - mingw/cygwin, GTK+ - "terra incognita".

Comment 10 Vadim Zelenin 2006-08-04 15:52:09 UTC

(In reply to comment #8)
> I'm also unable to confirm the bug (Windows XP).

Just install GTK+ (gtk+-2.8.18-setup-1.exe) and dia (0.95.1) under Windows XP SP2 (localised, of course) - bug reproduced.

Comment 11 Steffen Macke 2006-08-05 04:19:24 UTC

Could you try to enable "Install files for complex scripts and right-to-left languages"? 

You'll find this under the Windows language and regional settings
in the Control Panel.

Comment 12 Vadim Zelenin 2006-08-07 12:18:10 UTC

(In reply to comment #11)
> Could you try to enable "Install files for complex scripts and right-to-left
> languages"? 
> You'll find this under the Windows language and regional settings
> in the Control Panel.

Under W2K "language and regional settings" applet of Control Panel does not have required option.

I dare to intend what is happen:
1) In dia for string representation used ghar* that contain data in UTF-8.
2) For folding comments in UML objects destined function dia-0.95/objects/UML/class.c/uml_create_documentation_tag
3) That used isspace(gchar) from <ctype.h> for searching boundaries of words.
4) It is erroneous for two reasons:

most generaly: It is impossible to make decision by part character code in mulibyte character encoding

less generaly: In other code samples used g_ascii_isspace (see sample: http://mail.gnome.org/archives/gtk-devel-list/2002-February/msg00124.html)
because "Unlike the standard C library isspace() function, this only recognizes standard ASCII white-space and ignores the locale, returning FALSE for all non-ASCII characters" (see: http://antex.ru:8080/doc/gtk2/glib/glib-String-Utility-Functions.html#g-ascii-isspace). In opposition result of isspace undefined for non-ansi characters.

have guessed right?

Comment 13 Vadim Zelenin 2006-08-22 09:04:31 UTC

Hi,
now I know how to build dia for myself.

Replacement of isspace to g_ascii_isspace fully corrected the discussed bug.
Unfortunately anonymous access to CSV does not allow me independently to correct source.

Be kind, make change in objects/UML/class.c

Comment 14 Vadim Zelenin 2006-08-22 15:45:10 UTC

538c538
<   while( isspace(comment[CommentIndex])){
---
>   while( g_ascii_isspace(comment[CommentIndex])){
569c569
<           isspace(comment[CommentIndex+LineLen])){
---
>           g_ascii_isspace(comment[CommentIndex+LineLen])){
589c589
<     while( isspace(comment[CommentIndex])){
---
>     while( g_ascii_isspace(comment[CommentIndex])){

Comment 15 Hans Breuer 2006-08-22 20:31:29 UTC

given that a comment can be in any language the right fix should involve 
g_utf8_get_char () and g_unichar_isspace (). There is probably more wrong 
with the line breaking for utf-8 strings.
All this strcat() and a-string-contains-of-single-chars stuff does not 
fit the utf-8 encdong the string is in. Apparently most users simply
use english for their comment (or don't use comments at all;))

BTW: unified diffs (diff -u) is the prefered format for patches.

Comment 16 Vadim Zelenin 2006-08-25 07:35:54 UTC

g_unichar_isspace() is not recommended for search a word boundaries.

But because I don't know reasons lets use it.

strcat() is valid tool for utf-8 encoded string, of course when copying whole utf-8 chars.

a-string-contains-of-single-chars stuff also valid when using ASCII chars, such as '\n' and '}'.

Comment 17 Vadim Zelenin 2006-08-25 07:48:56 UTC

Created attachment 71567 [details] [review]
next version of wrapping of UML class comments

Comment 18 Vadim Zelenin 2006-09-03 08:33:48 UTC

Hello,
Week passed from the date of sending last patch.
Unfortunately, I do not see it in CVS nor comments to this patch. 
May be cause in heavy load of developer(s)/maintainer(s),  but it undermines belief in power of Open Source. 
How can I obtain  the write access to CVS?

Comment 19 Hans Breuer 2006-10-14 13:26:20 UTC

Comment on attachment 71567 [details] [review]
next version of wrapping of UML class comments

To answer your last question first: by providing quality patches which convince the maintainers that it would be easier to let you apply them directly. Naad mabye some stronger belief in the powers of open source ;)

BTW: our patch does not apply cleanly to cvs.
hb@hb-athlon UML $ patch --dry-run < class.diff
patching file class.c
Hunk #1 FAILED at 30.
Hunk #2 FAILED at 516.
2 out of 2 hunks FAILED -- saving rejects to file class.c.rej

Comment 20 Vadim Zelenin 2006-10-16 08:45:12 UTC

(In reply to comment #19)
> hb@hb-athlon UML $ patch --dry-run < class.diff
> patching file class.c
> Hunk #1 FAILED at 30.
> Hunk #2 FAILED at 516.
> 2 out of 2 hunks FAILED -- saving rejects to file class.c.rej

Please describe reqirements to .diff file.
I produce it from wincvs, checking "unified diff" without other options.
Prompt as it to do correctly.

Comment 21 Vadim Zelenin 2006-10-16 12:03:03 UTC

Created attachment 74793 [details] [review]
next try - now patch --dry-run  <class.diff work correctry

"The devil is in the details" - differences in spaces and some software does not keep spaces/tabs. :(

Comment 22 Hans Breuer 2006-10-16 20:27:16 UTC

Thanks, applied.

2006-10-16  Hans Breuer  <hans@breuer.org>

	* objects/UML/class.c : the comment wrapping was only working for
	plain ASCII, now it deals with UTF-8 (Vadim Zelenin, bug #349765)