After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 168108 - Allow Unicode format characters to be injected into source
Allow Unicode format characters to be injected into source
Status: RESOLVED OBSOLETE
Product: gtksourceview
Classification: Platform
Component: General
unspecified
Other All
: Normal enhancement
: ---
Assigned To: GTK Sourceview maintainers
GTK Sourceview maintainers
Depends on: 70399
Blocks: Persian Hebrew
 
 
Reported: 2005-02-22 03:43 UTC by Behdad Esfahbod
Modified: 2021-07-05 11:02 UTC
See Also:
GNOME target: ---
GNOME version: Unversioned Enhancement



Description Behdad Esfahbod 2005-02-22 03:43:34 UTC
We (the Arabic and Hebrew script users) need to be able to inject Unicode
bidirectional format characters such as LRM and RLM, LRE, ... into source code
for proper rendering.  For example, HTML tags should be surrounded by LRE..PDF
for proper rendering of the angle brackets.   Ideally we should be able to
insert this marks in the .lang files.
Comment 1 Paolo Maggi 2005-02-22 08:44:48 UTC
It would be very nice if some Arabic and/or Hebrew developers could help us to
fix this problem and the one reported in bug #122990.

Want do you exactly mean with "injecting"? Can you attach a sample HTML file
with such characters?
Comment 2 Behdad Esfahbod 2005-03-11 00:59:21 UTC
Sorry for the long delay.  Not sure about bug #122990, but for this one, I can
help myself and will redirect other interested people too.  Humm, I try to give
an example.  Say we have the Python source code:

  month_names = ["JANUARY", "FEBURARY", "MARCH"]

where the capitalized strings are the Persian translations of the respective
English words.  Now if you view this line of source code in gedit, you see
something like this:

  month_names = ["HCRAM" ,"YRARUBEF" ,"YRAUNAJ"]

while the desired visual string is:

  month_names = ["YRAUNAJ", "FEBURARY", "HCRAM"]

this happens because from a "plain text" point of view, the whole substring:

  JANUARY", "FEBURARY", "MARCH

is a Persian text embedded inside the English "sentence" which is the line of
code.  Apparently source code is not plain text, and the strings are separate
sub-texts of the line.  The easiest way to handle this is to insert some markup
in the string such that when rendered as a plain text, would result in the
expected visual string.

The Unicode standard has defined some control characters that are useful here,
they are typically called directional formatting codes and are defined here:
http://www.unicode.org/reports/tr9/#Directional_Formatting_Codes

These are by no means enough to for our purposes, for example a better model
probably is the CSS 2.1 properties unicode-bidi and direction, defined here:
http://www.w3.org/TR/CSS21/visuren.html#direction

Currently Pango doesn't support any markup for these purposes, so the only
option right now is to insert Unicode formatting characters.  For example the
example above which was:

  month_names = ["JANUARY", "FEBURARY", "MARCH"]

if passsed to Pango as:

  month_names = ["<LRE>JANUARY<PDF>", "<LRE>FEBURARY<PDF>", "<LRE>MARCH<PDF"]

where the XML-like tags here are Unicode directional formatting characters, then
the desired rendering is achieved in this case (but not all the cases, that's a
limitation of the Unicode directional formatting characters).


So my personal overview is:

  * gtksourceview will let .lang files to specify some markup or Unicode
characters to be put at certain positions, say, around the matched fields (that
should be enough, I guess).  I don't know about the gtksourceview
implementation, but my personal feeling is that gtksourceview adds Pango markup
to the text to colorize.  I guess inserting real Unicode characters would need a
bit of work, since they should not be copied/moved over/etc, but that's your
department.

  * Add bidi markup support to Pango.  This should happen anyway, but if
gtksourceview doesn't like inserting Unicode characters, it would be blocking
this bug.

  * We bidi-spec people (http://freedesktop.org/wiki/Standards_2fbidi_2dspec)
will write down and provide the actual data for different .lang files.



Hope it helps
behdad
Comment 3 Behdad Esfahbod 2005-11-23 14:00:54 UTC
The required changes in Pango are in bug 70399.
Comment 4 Yevgen Muntyan 2008-08-03 07:40:31 UTC
GtkSourceView doesn't modify text, it uses text tags. Could "direction" property of GtkTextTag be used here? Is this bug still valid?
Comment 5 Behdad Esfahbod 2008-08-05 05:11:01 UTC
(In reply to comment #4)
> GtkSourceView doesn't modify text, it uses text tags. Could "direction"
> property of GtkTextTag be used here? Is this bug still valid?

It's valid, and needs new markup in Pango.  The blocker bug is already set.  Just ignore it for now.
Comment 6 GNOME Infrastructure Team 2021-07-05 11:02:15 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/gtksourceview/-/issues/

Thank you for your understanding and your help.