After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 782025 - Pango: Workaround broken right to left subtitle
Pango: Workaround broken right to left subtitle
Status: RESOLVED OBSOLETE
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.10.4
Other Linux
: Normal enhancement
: git master
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2017-05-01 20:03 UTC by Elad
Modified: 2018-11-03 11:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
an example for the incorrect alignment (174.54 KB, image/png)
2017-05-01 20:03 UTC, Elad
Details
a correct alignment of the same subtitle in VLC (154.18 KB, image/png)
2017-05-01 20:04 UTC, Elad
Details
a sample subtitle file (86.24 KB, application/x-subrip)
2017-05-01 23:33 UTC, Elad
Details
a UTF-8 version of the subtitles (114.12 KB, application/x-subrip)
2017-05-02 17:09 UTC, Elad
Details

Description Elad 2017-05-01 20:03:14 UTC
Created attachment 350822 [details]
an example for the incorrect alignment

For RTL languages, subtitles should be aligned correctly.
This is a basic feature which I could not detect in the version shipped with Ubuntu 16.04.
Comment 1 Elad 2017-05-01 20:04:58 UTC
Created attachment 350823 [details]
a correct alignment of the same subtitle in VLC
Comment 2 Bastien Nocera 2017-05-01 22:21:24 UTC
Please attach the subtitle file (or provide a way to reproduce the problem).
Comment 3 Elad 2017-05-01 23:33:41 UTC
Created attachment 350833 [details]
a sample subtitle file
Comment 4 Bastien Nocera 2017-05-02 08:55:05 UTC
Can you please give an explanation on how to reproduce this? Because this isn't anywhere near UTF-8 encoding.
Comment 5 Elad 2017-05-02 14:16:36 UTC
The attached subtitles are encoded using windows-1255 ("HEBREW" > "HEBREW (WINDOWS-1255)" in the preferences).
The alignment is correct in UTF-8 (but it is the wrong encoding for most Hebrew subtitles, and probably other RTL languages as well).
Comment 6 Bastien Nocera 2017-05-02 16:16:22 UTC
That should be enough information to fix this. Reassigning to GStreamer as that's what's actually showing the subtitles. It might also be a problem in clutter-gst, as it also has a hand in displaying them.
Comment 7 Nicolas Dufresne (ndufresne) 2017-05-02 16:40:29 UTC
To playback this:

gst-launch-1.0 playbin suburi=file://$(pwd)/hp.srt subtitle-encoding=windows-1255 uri=file://$(pwd)/test.mov video-sink="navseek ! audiovideosink"

Now great, but we'll need a tad more help to understand the meaning of wrong alignment. Looking a the image, I understand the the punctuation, like . and ? are placed at the right instead of the left (at the start instead of the end, or worst, is placed at the end as if the text was left to right). Like if the direction changes when it renders the punctuation. Maybe some bogus language detection.

Would be nice if you could provide an UTF-8 converted version of this srt. Since this would help us see the behaviour when it's doing it correctly. Maybe in UTF-8 they don't map punctuation to the same charater, avoiding renderer confusion in text direction. Also, note this could be an issue with Pango and/or the way we use Pango.
Comment 8 Elad 2017-05-02 17:09:33 UTC
Created attachment 350898 [details]
a UTF-8 version of the subtitles

UTF-8 encoded Hebrew subtitles are improperly aligned as well, so it is probably not about the encoding (ignore comment #5).
The wrong placement of the final punctuation mark(s) (to the right of the text) is noticeable.
In addition, combining RTL and LTR text in the same subtitle could also be displayed incorrectly.
(These subtitles are aligned properly in VLC too)
Comment 9 Nicolas Dufresne (ndufresne) 2017-05-02 18:59:33 UTC
Hmm, opening the UTF-8 file in gedit present the same issue as in GStreamer. This is likely a Pango bug.
Comment 10 Nicolas Dufresne (ndufresne) 2017-05-02 19:15:57 UTC
OK, what could help, here's a really short pipeline to test this.

gst-launch-1.0 videotestsrc pattern=18 ! \
  video/x-raw,width=1920,height=1080,framerate=1/1 ! \
  textoverlay text="my text" ! autovideosink

Make sure your terminal is in UTF-8 (should already be the case). Then input various utf-8 stream (minimal one) that renders incorrectly. This will be helpful because this bug has to be reported to Pango, and I'll probably fail at understanding which group of characters should be change into another. Maybe provide a file with just the lines that reproduce the bug (the text).

Is Hebrew working in general in Gnome ?
Comment 11 Elad 2017-05-02 20:18:10 UTC
The handling of Hebrew varies between different Pango based programs:
Nautilus - when typing a Hebrew file name it aligns correctly, after pressing Enter to change the name it aligns LTR.
Firefox - the alignment of the address bar can be manually changed while typing by pressing ctrl+shift+x.
I do not know whether VLC uses Pango or not, but it handles Hebrew well.
So this is possibly related to the usage of Pango rather than a bug in Pango.
Comment 12 Nicolas Dufresne (ndufresne) 2017-05-03 13:52:56 UTC
By default pango uses automatic direction, as documented here:

https://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-set-auto-dir

As this is the default, most application will use this mode. I suspect that automatic direction can get confused by characters both used LTR and RTL. GStreamer uses this mode too. I don't know how what VLC uses and why it is detected correctly. VLC is a very different project.
Comment 13 Nicolas Dufresne (ndufresne) 2017-05-03 18:06:10 UTC
There is a small problem with the UTF-8 version, punctuation like . and ? are the first character. I think RTL strings should appear up-side-down in UTF-8 streams. If I paste the string itself to google translate, it seems happy with it. So what we render in GStreamer from this file is correct in this regard. How did you produce the UTF-8 version ?
Comment 14 Elad 2017-05-03 18:24:21 UTC
It looks that way in gedit (as this seems to be an issue with the libraries some GNOME uses) but the UTF-8 version is also displayed correctly by VLC.
I have created it from the WINDOWS-1255 version using an online converter.
Comment 15 Elad 2017-05-03 18:24:56 UTC
It looks that way in gedit (as this seems to be an issue with the libraries GNOME uses) but the UTF-8 version is also displayed correctly by VLC.
I have created it from the WINDOWS-1255 version using an online converter.
Comment 16 Nicolas Dufresne (ndufresne) 2017-05-03 18:50:05 UTC
Ok, more research, this is definatly a question to raise to Pango people, we cannot do anything about it in Gst. What I understood is that the text is being itemized. So if I write:

Allo דמבלדור

You read the first item "Allo" not "ollA", and the second item "דמבלדור" and not "רודלבמד". This is all done internally to Pango, GStreamer really have nothing to do with it. For windows-1255 (or ISO-8859-8 logical order), what happens is that it's simply first converted to UTF-8, hence the same results. Works also if you collate the two words, since for each characters, pango will decide to itemized base on the Unicode bidirectional algorithm [1].

From my tests, punctuation is neutral. Which I believe means that it should inherit from left most item in logical order. And overall items are rendered left to right. So let's way we have "?ר" (logical order), it should render the same. But Pango will make the neutral ? inherit from the next character, hence place the ? at the right. I didn't read the algorithm, so I won't comment about who's right/wrong. Though, I can show a case which is not possible with the way VLC handle it. Let's say you want 2 hebrew characters and punctuation before and after.

  !<c2><c1>?

Logical order would be:

  ?<c1><c2>!

Pango will render it the way you expect, but VLC would render:

  ?!<c2><c2>

And there would be no way to represent what you want.

[1] http://www.unicode.org/reports/tr9/
Comment 17 Nicolas Dufresne (ndufresne) 2017-05-03 19:01:01 UTC
Reassign to Pango, as we need real expert to understand who's right/wrong between VLC renderer and Pango with this ambiguous SRT files.
Comment 18 Behdad Esfahbod 2017-05-03 21:16:38 UTC
My understanding is (I've seen this before), because most players do NOT support automatic direction, many subtitles files are produced incorrectly, to compensate for that!  That is, the end-of-sentence period is located at the beginning of the line.  When rendered with systems that use automatic direction, this shows wrong...

Nothing to do in Pango.  If gstreamer wants, it can try to detect this, or have an option for users to set from a menu, to force direction right-to-left, left-to-right, or auto (default).
Comment 19 Elad 2017-05-03 21:44:53 UTC
The automatic direction detection possibly causes more harm than good, since every RTL subtitle file I tried is displayed correctly on every other media player.
Using LTR direction all the time instead might solve this.
Comment 20 Behdad Esfahbod 2017-05-03 21:46:34 UTC
(In reply to Elad from comment #19)
> The automatic direction detection possibly causes more harm than good,

In the subtitle world, maybe...

> since
> every RTL subtitle file I tried is displayed correctly on every other media
> player.
> Using LTR direction all the time instead might solve this.

I can imagine that.  It shouldn't be horrible to write a detector that counts how many lines start with '.' and how many lines end in '.' and decide.
Comment 21 Nicolas Dufresne (ndufresne) 2017-05-04 11:03:44 UTC
Thanks, that confirms my thought about these file being broken. They are tailored to workaround issues in other players.
Comment 22 GStreamer system administrator 2018-11-03 11:56:47 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/gstreamer/gst-plugins-base/issues/354.