After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 656863 - Unicode astral characters corrupt Tomboy notes
Unicode astral characters corrupt Tomboy notes
Product: tomboy
Classification: Applications
Component: General
Other Linux
: Normal major
: ---
Assigned To: Tomboy Maintainers
Tomboy Maintainers
Depends on:
Reported: 2011-08-19 02:48 UTC by Matt Giuca
Modified: 2017-07-31 12:44 UTC
See Also:
GNOME target: ---
GNOME version: ---

Example of a file which becomes corrupted upon re-opening in Tomboy. (801 bytes, application/octet-stream)
2011-08-19 02:48 UTC, Matt Giuca

Description Matt Giuca 2011-08-19 02:48:18 UTC
Created attachment 194182 [details]
Example of a file which becomes corrupted upon re-opening in Tomboy.

Entering astral Unicode characters (U+10000 and above) into a Tomboy note can produce strange behaviour when the document is reloaded, causing the note to become permanently corrupted.

To reproduce:
1. Create a new note.
2. Create a bulleted list, with the character
Comment 1 Matt Giuca 2011-08-19 02:50:26 UTC
Umm... it looks like Bugzilla *also* doesn't support astral characters properly, since my bug report was cut off at the first mention of one. The rest of the bug report follows:

2. Create a bulleted list, with the character (U+1D49C) in the first item, and some ordinary text in the second item.
3. Close Tomboy completely.
4. Observe that Tomboy correctly wrote this into the saved .note file (full file attached):

<list><list-item dir="ltr">[U+1D49C -- not shown since Bugzilla is allergic to this character]
</list-item><list-item dir="ltr">Two</list-item></list>

5. Re-open Tomboy, and open the note. Observe that the note has become corrupted. The second item in the bulleted list now has the bullet point in between the "T" and "w", like this: "T• wo". If you had created a longer document, the entire document would be corrupted like this from that point onwards.
6. Observe that Tomboy has written a modified document back to disk. In this case, it has taken the "Two" out of the list entirely, but I have seen other cases in which it moved certain characters out of list items but not others. Generally, the document is corrupted and requires manual work to recover.

I suspect this is due to the .NET framework's handling of astral characters. I don't know much about .NET, but if it is anything like Java (and I believe it is), it treats astral characters as single character units, but assigns them two indices in the string, so if you are manually dealing with string indices, subtle off-by-one bugs may be introduced (which explains why the bullet point slipped one character inside the "Two" on the following line).

Tomboy version: 1.6.0
Mono version: 2.6.7
OS version: Ubuntu 11.04
Comment 2 Matt Giuca 2011-08-19 02:52:35 UTC
(Yes, this is a bug in Bugzilla:
Comment 3 Alemann Massho 2012-05-12 20:29:04 UTC
I can confirm this bug in 1.10.1. Another bad thing: insert something before the astral character, press Ctrl-Z, and watch tomboy segfault.
Comment 4 André Klapper 2017-07-31 12:44:45 UTC
The Tomboy team has moved from GNOME Bugzilla to GitHub for bug reports and feature requests:
Closing this report as NOTGNOME as part of Bugzilla Housekeeping (bug 781054) to keep tasks in one place. Please feel free to transfer this task to GitHub if this task is still valid in a recent Tomboy version. 
We are sorry for the inconvenience.