GNOME Bugzilla – Bug 85992
should be able to manually specify output encoding in gedit
Last modified: 2004-12-22 21:47:04 UTC
menus should be added so that a user can specify the input encoding and output encoding for a buffer. Input encoding should override automatic detection of the encoding of the current file (useful in the case that automatic detection fails to identify the encoding correctly) Output encoding should determine the encoding in which the file is written out when saved. (ie, there should be more choices than just "current locale" or "UTF-8" as there are now) These settings should be buffer-specific, since a user may be dealing with many differently encoded files in the same gedit window.
I accept suggestions on how implementing "automatic detection of the encoding". I will not implement these features in the next future since I think they are out of the scope of simple editor like gedit, but if you want to provide patches, I will eventually accept them.
Well, it doesn't have to be automatic detection, but it would still be useful to be able to specify an input encoding that is not UTF8 or the current encoding... Perhaps I am a bit biased.. since Japanese has two major encodings in use (SJIS, and EUCJP) so being able to edit both with one editor seems like an obvious feature. This type of feature is easier to implement when both encodings can be converted to a common one, unicode, and then at output, be reconverted back.
*** Bug 89066 has been marked as a duplicate of this bug. ***
*** Bug 91491 has been marked as a duplicate of this bug. ***
Changing the input encoding is implemented in HEAD AFAICT. Output encoding is not.
Paolo, is it reasonable/possible to specify an output encoding in gedit? If not, this bug can be closed.
Yes, it could be reasonable.
This needs a little thought. Contrast with KDE's Kate (Kwrite), which allows input in any input method you like, but allows saving in any encoding. Kate isn't smart about the saving part. It will silently allow data loss to occur, as when saving Cyrillic as ASCII. I agree that specification of encoding should be implemented, but it must be done in such a way that data loss does not occur. Encoding is a matter of how data is viewed, not how it is saved. Note you can view any data in any encoding (it may be a mess, and some bytes may be undisplayable), but there isn't any programmatic way of determining whether saved data is text of one encoding or another. So the encoding of the document should be specified as long as the document is open. If an attempt is made to input characters that can't be represented in that encoding, the user should (at least) be informed. If the user attempts to change the encoding to one that can't support characters presently encoded in the document, they should likewise be notified. Saving the document is then a one-step operation. Alternatively, you could allow saving internal UTF as another encoding, but first, check that all the characters in the document can be represented in that encoding, and inform the user if they don't.
Fixed in CVS HEAD.