After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 690531 - gedit should work with Unicode noncharacters
gedit should work with Unicode noncharacters
Status: RESOLVED DUPLICATE of bug 694669
Product: gedit
Classification: Applications
Component: general
3.4.x
Other Linux
: Normal normal
: ---
Assigned To: Gedit maintainers
Gedit maintainers
Depends on:
Blocks:
 
 
Reported: 2012-12-19 23:43 UTC by Markus Scherer
Modified: 2013-11-04 10:40 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Markus Scherer 2012-12-19 23:43:22 UTC
I am editing Unicode and CLDR data files. Some of those files contain noncharacters like U+FDD0 (UTF-8: EF B7 90). When I open one of those files, gedit 3.4.1 complains with a big scary banner at the top: "There was a problem opening the file /home/mscherer/svn.cldr/…k/common/collation/ko.xml." and "The file you opened has some invalid characters. If you continue editing this file you could corrupt this document.
You can also choose another character encoding and try again." [Retry | Edit Anyway | Cancel]

The file displays fine except that U+FDD0 is shown as \x-escaped UTF-8 bytes as if it were ill-formed: <p>\EF\B7\90⼀</p><!-- INDEX 1 -->

Please fix the Unicode in gedit such that files on unicode.org can be edited...

The ko.xml file is available here:
http://unicode.org/cldr/trac/browser/trunk/common/collation/ko.xml

Most noncharacters are permitted in HTML and XML. Editors should not flag them as errors.
Comment 1 Markus Scherer 2013-02-21 19:28:42 UTC
FYI

Unicode Corrigendum #9: Clarification About Noncharacters
http://www.unicode.org/versions/corrigendum9.html

Unicode FAQ
Q: Are there any 16-bit values that are invalid?
http://www.unicode.org/faq/utf_bom.html#utf16-7

Q: What about noncharacters? Are they invalid?
http://www.unicode.org/faq/utf_bom.html#utf16-8

Also

XML 1.1
"XML processors must accept any character in the range specified for Char."
http://www.w3.org/TR/2006/REC-xml11-20060816/#charsets

(XML chose to forbid U+FFFE & U+FFFF but not the other 64 noncharacters. Noncharacters are "discouraged" but so are compatibility characters like full-width ASCII.)
Comment 2 Sébastien Wilmet 2013-11-03 23:11:51 UTC

*** This bug has been marked as a duplicate of bug 660633 ***
Comment 3 Behdad Esfahbod 2013-11-04 10:40:01 UTC
This was fixed earlier this year.

*** This bug has been marked as a duplicate of bug 694669 ***