GNOME Bugzilla – Bug 554721
UTF-8 encoded files saved with byte-order mark (BOM)
Last modified: 2008-12-30 14:18:48 UTC
Please describe the problem: When I save subtitles to a file using UTF-8 encoding (which is the default in my environment), the resulting file has a byte-order mark in it. Steps to reproduce: 1. Create a new subtitle file in Gnome Subtitles 2. Save the subtitles to a file, choosing UTF-8 for encoding. 3. See what the 'file' command has to say about the saved file. Actual results: 'file' reports the file has a BOM in it ("UTF-8 Unicode (with BOM) text"). Expected results: The resulting file not to have a BOM in it. Does this happen every time? Yes. Other information: The BOM results in Totem not being able to display the subtitles. It *can* display the same subtitles if they're in a UTF-8 encoded file without the BOM, so if I 'bomstrip' the file saved by Gnome Subtitles, the subtitles play back just fine.
Hi, Thanks for the report. The problem isn't linear, unfortunately. While in some UTFs the BOM is used to distinguish from little and big endian machines, the BOM is used in UTF-8 as a signature. The UTF-8 specification explicitely allows its use, and only prohibits it where the character coding is known beforehand. This is not the case, so the use of a BOM in subtitle files is correct. So, my opinion is that Totem should be able to read these files. This obviously is more easy to say than to do, but still, would you mind opening a bug in GStreamer (which is also available in this bugzilla)? (GStreamer is the backend for Totem.)
Thanks for responding. I did as you suggested: bug 555257 now covers the Totem/GStreamer side of this.
Fixed in Totem, resolving this one to not a bug. Thanks again for identifying this problem, though.
*** Bug 565425 has been marked as a duplicate of this bug. ***