After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 768803 - Pasting languages other than english results in utf8 code being pasted.
Pasting languages other than english results in utf8 code being pasted.
Status: RESOLVED NOTGNOME
Product: gtk+
Classification: Platform
Component: Backend: Wayland
unspecified
Other Linux
: Normal normal
: ---
Assigned To: gtk-bugs
gtk-bugs
Depends on:
Blocks:
 
 
Reported: 2016-07-14 12:01 UTC by barz621
Modified: 2016-07-16 15:14 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
terminology output (125.31 KB, text/plain)
2016-07-14 12:43 UTC, barz621
Details

Description barz621 2016-07-14 12:01:04 UTC
To reproduce. (use weston)

Open a gtk app (ie transmission gtk)

Type on an entry field something in english and ie greek.

Select > Copy > and paste in an EFL app or QT app.

You will see the english text and utf8 characters in the place of the greek text.
Comment 1 Matthias Clasen 2016-07-14 12:20:12 UTC
What makes you think the english text is not utf8 as well ?

Can you describe in some more detail what you are actually seeing ? A screenshot might be helpful
Comment 2 barz621 2016-07-14 12:30:44 UTC
https://www.enlightenment.org/ss/display.php?image=e-5772d8936a3835.76966373.png

Copied from transmission in the background and pasted on terminology.

Also this comment by the guy who dove into it. 

https://phab.enlightenment.org/T3972#60775
Comment 3 Jonas Ådahl 2016-07-14 12:36:04 UTC
Could you reproduce the issue while running the EFL client with WAYLAND_DEBUG=1 environmental variable set, and attach the output here?
Comment 4 barz621 2016-07-14 12:43:39 UTC
Created attachment 331490 [details]
terminology output

This is output from terminology where the selection if copied from transmission-gtk (on weston)
Comment 5 Jonas Ådahl 2016-07-14 12:45:22 UTC
I tried to reproduce this when running a QT client with -platform wayland. What I can see is that it requests text/plain (and ignores text/plain;charset=utf-8) which I assume is assumed to be ASCII. The content that is pasted into the text widget looks awfully lot like UTF-8 codepoints that has been converted to ASCII, i.e. "åå" was translated into "\u00e5\u00e5" which I guess could be considered correct. If the client actually wanted the real string, without converting UTF-8 to \u... ASCII, it should request text/plain;charset=utf-8.
Comment 6 Jonas Ådahl 2016-07-14 12:51:59 UTC
Looking at the log output from terminology, it looks like terminology also requests "text/plain" instead of "text/plain;charset=utf-8". So it gets what it asks for: some ASCII string. I assume GTK+ tries to be nice and does a loss-less conversion and translates the code points to \uHEX instead of just dropping them.

I tested changing weston-terminal to request the text/plain instead of text/plain;charset=utf-8 and it resulted in the exact same as in the screenshot: \u00e5 was pasted instead of å. Without that change, weston-terminal works fine.
Comment 7 barz621 2016-07-14 13:14:00 UTC
So this has to be fixed in all the toolkits or it is a GTK thing? 

Also noticed another thing -that might be related-. When pasting something in GTK from ie EFL and the text has -what i assume to be- a line change character it shows on the place you paste it (as a square).

To rep: select a whole line in terminology >copy and paste on a gtk field.
Comment 8 Jonas Ådahl 2016-07-14 13:28:27 UTC
If you copy "abcåååxyz" and want some other client to paste that and not some ASCII variant of it (be it lossy or converted to \uHEX), that client needs to get the text/plain;charset=utf-8 content. If the client expects UTF-8 and doesn't request that mime-type, then its a client bug, not a GTK+ bug.

About the newline in a text entry. Text entries can't have multiple lines, so it'll show the code point symbol thing instead. I'm not sure if that is intended or not, but I suspect it is. Can also be reproduced by copying two lines in gedit then pasting in a text entry.
Comment 9 barz621 2016-07-14 13:36:02 UTC
So if the client doesn't specify the charset (as it happens in Qt as you mention above) shouldn't GTK just sent ASCII?
Comment 10 Jonas Ådahl 2016-07-14 13:40:34 UTC
(In reply to barz621 from comment #9)
> So if the client doesn't specify the charset (as it happens in Qt as you
> mention above) shouldn't GTK just sent ASCII?

As far as I can tell it does. For å it sends \u00e5, i.e. backslash, u, zero, zero, e, five; those 6 ASCII characters. The alternative is to send an empty string, doing a lossy conversion.
Comment 11 Carlos Garnacho 2016-07-14 13:54:03 UTC
(In reply to Jonas Ådahl from comment #8)
> About the newline in a text entry. Text entries can't have multiple lines,
> so it'll show the code point symbol thing instead. I'm not sure if that is
> intended or not, but I suspect it is. Can also be reproduced by copying two
> lines in gedit then pasting in a text entry.

That is the case, down to pango_layout_set_single_paragraph_mode()
Comment 12 barz621 2016-07-14 13:56:44 UTC
FWIW Qt to EFL (and vice versa) works flawlessly. Only GTK poops the bed. Hence the reason i filed it here. 

But you seem to imply that GTK does the correct thing? I got confused sorry.
Comment 13 Jonas Ådahl 2016-07-14 14:09:46 UTC
The things you reported (terminology getting ASCII instead of UTF-8) looks like an EFL/QT issue, since they don't ask for UTF-8 even though GTK+ provides it.

On the other hand, it seems GTK+ fails to paste at all from QT5 when QT5 uses its Wayland backend. weston-terminal handles it fine, and I can't see any missing mime types in the protocol stream. That, on the other hand, looks like a GTK+ bug.
Comment 14 Matthias Clasen 2016-07-15 13:50:52 UTC
so what are qt and efl doing wrt to encodings ? just assume that text/plain is utf--8 ? If you copy from them to gtk+, do we get utf-8 when asking for text/plain ?
Comment 15 Jonas Ådahl 2016-07-15 14:11:47 UTC
They get ASCII when asking for text/plain (which without charset specified should be US-ASCII), so regarding what this bug is about GTK+ is doing nothing wrong. The "utf-8" mentioned unicode points expanded into ASCII.

We do have a bug though: can't paste something copied from QT runnig via Wayland in GTK+
Comment 16 barz621 2016-07-16 10:41:21 UTC
@Jonas

Since this isn't a GTK bug close it and start a new one about the QT to GTK paste issue or change the title.

I filed a bug in Qt regarding the problem. EFL fixed it already.
Comment 17 Jonas Ådahl 2016-07-16 15:14:43 UTC
Sure. Closing this one. Opened bug 768887 for the other discovered issue.