GNOME Bugzilla – Bug 656863
Unicode astral characters corrupt Tomboy notes
Last modified: 2017-07-31 12:44:45 UTC
Created attachment 194182 [details] Example of a file which becomes corrupted upon re-opening in Tomboy. Entering astral Unicode characters (U+10000 and above) into a Tomboy note can produce strange behaviour when the document is reloaded, causing the note to become permanently corrupted. To reproduce: 1. Create a new note. 2. Create a bulleted list, with the character
Umm... it looks like Bugzilla *also* doesn't support astral characters properly, since my bug report was cut off at the first mention of one. The rest of the bug report follows: 2. Create a bulleted list, with the character (U+1D49C) in the first item, and some ordinary text in the second item. 3. Close Tomboy completely. 4. Observe that Tomboy correctly wrote this into the saved .note file (full file attached): <list><list-item dir="ltr">[U+1D49C -- not shown since Bugzilla is allergic to this character] </list-item><list-item dir="ltr">Two</list-item></list> 5. Re-open Tomboy, and open the note. Observe that the note has become corrupted. The second item in the bulleted list now has the bullet point in between the "T" and "w", like this: "T• wo". If you had created a longer document, the entire document would be corrupted like this from that point onwards. 6. Observe that Tomboy has written a modified document back to disk. In this case, it has taken the "Two" out of the list entirely, but I have seen other cases in which it moved certain characters out of list items but not others. Generally, the document is corrupted and requires manual work to recover. I suspect this is due to the .NET framework's handling of astral characters. I don't know much about .NET, but if it is anything like Java (and I believe it is), it treats astral characters as single character units, but assigns them two indices in the string, so if you are manually dealing with string indices, subtle off-by-one bugs may be introduced (which explains why the bullet point slipped one character inside the "Two" on the following line). Tomboy version: 1.6.0 Mono version: 2.6.7 OS version: Ubuntu 11.04
(Yes, this is a bug in Bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=545488)
I can confirm this bug in 1.10.1. Another bad thing: insert something before the astral character, press Ctrl-Z, and watch tomboy segfault.
The Tomboy team has moved from GNOME Bugzilla to GitHub for bug reports and feature requests: https://github.com/tomboy-notes/tomboy/issues/ Closing this report as NOTGNOME as part of Bugzilla Housekeeping (bug 781054) to keep tasks in one place. Please feel free to transfer this task to GitHub if this task is still valid in a recent Tomboy version. We are sorry for the inconvenience.