After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 582603 - gedit inserts UTF-16 BOM at the end of the file
gedit inserts UTF-16 BOM at the end of the file
Status: RESOLVED FIXED
Product: gedit
Classification: Applications
Component: general
2.26.x
Other All
: Normal minor
: ---
Assigned To: Gedit maintainers
Gedit maintainers
Depends on:
Blocks:
 
 
Reported: 2009-05-14 12:04 UTC by Lucian Chirita
Modified: 2015-04-27 10:45 UTC
See Also:
GNOME target: ---
GNOME version: 2.25/2.26



Description Lucian Chirita 2009-05-14 12:04:40 UTC
Please describe the problem:
When you save a file using UTF-16 as encoding, gedit inserts a Byte-order mark character (FF FE in little-endian order) at the beginning of the file (which is correct), but also at the end of the file (before a newline character).

The BOM character at the end of the file has no reason to be, and it causes certain XML parsers (namely Xerces) to fail while parsing UTF-16 XML files edited with gedit.

Steps to reproduce:
1. Open gedit.
2. Create a new file.
3. Type "a".
4. Click Save.
5. Choose UTF-16 as character encoding.
6. Open the file in a hex editor.


Actual results:
The hex editor shows the file as FF FE 61 00 FF FE 0A 00

Expected results:
The file should be saved as FF FE 61 00 0A 00

Does this happen every time?
Yes

Other information:
Comment 1 nospam.kotarou.dono 2010-08-13 12:14:06 UTC
Confirmed in verison 2.28.3 but this version doesn't insert the character at all.
Comment 2 nospam.kotarou.dono 2010-09-08 07:31:43 UTC
Confirmed again in version 2.30.0. This version still doesn't insert the character at all
Comment 3 Jens Knutson 2012-10-20 00:25:04 UTC
No offense to the GEdit team -- I'm very grateful for all their awesome work -- but this is **not** "minor" severity: THIS IS A DATA CORRUPTION BUG.

Yeah, it's a small one, but it can have a very real impact:  I wasted several hours of work today because of this.  If I hadn't been in love with Gedit for so long, well... I'd be considering saying "so long" to Gedit.  :-\

Clearly the *right* solution here is to start up a "UTF-8 OR DIE!" hit-squad, to go around making threats and declaring shenanigans against anyone or anything using an encoding other than UTF-8.  But until we can organize that, could this just get fixed?  Seems like it couldn't be *that* hard?  (though I don't really know C, so I can't say for sure.)

At the very least, can someone with the permissions to do so bump up the severity on this bug?
Comment 4 Ignacio Casal Quinteiro (nacho) 2012-10-20 09:26:44 UTC
We do add a \n at the end of the file indeed, but it is also done by other text editors as it is needed by other tools like cat etc. Anyway we have a setting that can be enabled to remove this "feature". Just use dconf and deactivate it. ensure-trailing-newline
Comment 5 jessevdk@gmail.com 2012-10-20 10:38:50 UTC
Nacho: the new line is ok, but the BOM? That's maybe because the last newline is encoded separately and considered a new document or something?
Comment 6 Ignacio Casal Quinteiro (nacho) 2012-10-20 11:43:46 UTC
Weird... we should encode the new line in the same way no? if not that's definitely a bug
Comment 7 Sébastien Wilmet 2015-04-27 10:45:28 UTC
$ hexdump <file>
0000000 feff 0061 000a
0000006

So this bug seems to be fixed, with gedit 3.14.