GNOME Bugzilla – Bug 582603
gedit inserts UTF-16 BOM at the end of the file
Last modified: 2015-04-27 10:45:28 UTC
Please describe the problem: When you save a file using UTF-16 as encoding, gedit inserts a Byte-order mark character (FF FE in little-endian order) at the beginning of the file (which is correct), but also at the end of the file (before a newline character). The BOM character at the end of the file has no reason to be, and it causes certain XML parsers (namely Xerces) to fail while parsing UTF-16 XML files edited with gedit. Steps to reproduce: 1. Open gedit. 2. Create a new file. 3. Type "a". 4. Click Save. 5. Choose UTF-16 as character encoding. 6. Open the file in a hex editor. Actual results: The hex editor shows the file as FF FE 61 00 FF FE 0A 00 Expected results: The file should be saved as FF FE 61 00 0A 00 Does this happen every time? Yes Other information:
Confirmed in verison 2.28.3 but this version doesn't insert the character at all.
Confirmed again in version 2.30.0. This version still doesn't insert the character at all
No offense to the GEdit team -- I'm very grateful for all their awesome work -- but this is **not** "minor" severity: THIS IS A DATA CORRUPTION BUG. Yeah, it's a small one, but it can have a very real impact: I wasted several hours of work today because of this. If I hadn't been in love with Gedit for so long, well... I'd be considering saying "so long" to Gedit. :-\ Clearly the *right* solution here is to start up a "UTF-8 OR DIE!" hit-squad, to go around making threats and declaring shenanigans against anyone or anything using an encoding other than UTF-8. But until we can organize that, could this just get fixed? Seems like it couldn't be *that* hard? (though I don't really know C, so I can't say for sure.) At the very least, can someone with the permissions to do so bump up the severity on this bug?
We do add a \n at the end of the file indeed, but it is also done by other text editors as it is needed by other tools like cat etc. Anyway we have a setting that can be enabled to remove this "feature". Just use dconf and deactivate it. ensure-trailing-newline
Nacho: the new line is ok, but the BOM? That's maybe because the last newline is encoded separately and considered a new document or something?
Weird... we should encode the new line in the same way no? if not that's definitely a bug
$ hexdump <file> 0000000 feff 0061 000a 0000006 So this bug seems to be fixed, with gedit 3.14.