After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 763589 - Incorrect line length in ems
Incorrect line length in ems
Status: RESOLVED FIXED
Product: gaupol
Classification: Other
Component: general
0.24.3
Other Windows
: Normal normal
: ---
Assigned To: gaupol-maint@gnome.bugs
gaupol-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2016-03-14 04:00 UTC by 1nktr4p
Modified: 2016-05-15 21:57 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
HTML test file (5.13 KB, text/html)
2016-04-06 09:16 UTC, 1nktr4p
Details
HTML test file (5.67 KB, text/html)
2016-05-15 03:25 UTC, 1nktr4p
Details

Description 1nktr4p 2016-03-14 04:00:33 UTC
This is the last available version for Windows. I don't know if it was fixed in later (Unix) versions.

When line length is set to ems, it is reported as a few ems more than it actually is for selected font (it's even longer than it would be if the text was set in Verdana, which is a lot wider font than most sans-serifs). Also, it seems to always be the same regardless of the font selected, so I'm guessing it's calculated based on some internally declared font.

For example, the line "How's it looking over there, Watney?" is reported to be 22 em, while in most sans-serif fonts it's around 16 em.
Comment 1 Osmo Salomaa 2016-03-14 17:32:42 UTC
(In reply to 1nktr4p from comment #0)
> When line length is set to ems, it is reported as a few ems more than it
> actually is for selected font (it's even longer than it would be if the text
> was set in Verdana, which is a lot wider font than most sans-serifs). Also,
> it seems to always be the same regardless of the font selected, so I'm
> guessing it's calculated based on some internally declared font.
> 
> For example, the line "How's it looking over there, Watney?" is reported to
> be 22 em, while in most sans-serif fonts it's around 16 em.

I do see the same 22 em as you, so it's not Windows-specific.

The line length is calculated based on the "sans" font, which is whatever GTK+ considers the default sans serif font to be. The editor font cannot obviously be used as it might be monospace, while actual subtitle fonts rarely are.

Where do you get your comparison em counts from?

I'm using Pango.FontDescription.get_size in Gaupol, which I have apparently years ago assumed or verified to be the "em", I don't remember anymore. It's also very close but not exact to the width of a capital M, "MMMMMMMMMM" shows in Gaupol as 11 ems.

https://developer.gnome.org/pango/stable/pango-Fonts.html#pango-font-description-get-size
Comment 2 1nktr4p 2016-04-06 09:14:14 UTC
Whoa, sorry, I completely forgot about filing this.

> The line length is calculated based on the "sans" font, which is whatever GTK+ considers the default sans serif font to be. The editor font cannot obviously be used as it might be monospace, while actual subtitle fonts rarely are.

While I agree it shouldn't be based on editor font, leaving it to GTK's default setting is not the best alternative. Ideally, it would be user-user specified, but the next best thing is to base it on DejaVu Sans for a couple of reasons: it's free, covers most of the Unicode and is one of the widest sans-serif fonts (if not *the* widest), which would ensure that the max selected subtitle width doesn't get exceeded regardless of the font used later when watching videos or rendering DVD/BD bitmaps.


> Where do you get your comparison em counts from?

Illustrator, but it works just as well when you make an HTML page and measure the elements in a browser. I've attached a test case with instructions and metrics for common fonts.


> I'm using Pango.FontDescription.get_size in Gaupol, which I have apparently years ago assumed or verified to be the "em", I don't remember anymore. It's also very close but not exact to the width of a capital M, "MMMMMMMMMM" shows in Gaupol as 11 ems.

Sorry, can't comment on Pango 'cause I'm not a programmer. But, while it's true that the width of uppercase "M" is historically the basis of the em unit, most fonts today have an M that's noticeably less than one em wide.

* * *

By the way, why aren't issues tracked on GitHub since the code is already over there? It's much more user-friendly than this blast from the past.
Comment 3 1nktr4p 2016-04-06 09:16:58 UTC
Created attachment 325466 [details]
HTML test file
Comment 4 Osmo Salomaa 2016-04-06 19:08:58 UTC
I tried Pango.FontDescription.get_size with a couple different fonts and I keep getting the same value. There might be something broken and it's using something else than the em width. I do vaguely remember that at some point I had to bump the value I use for line-breaking from 26 to 30 ems, but didn't bother investigating why I had to do that. I'll take a look at this when I have the time.

> By the way, why aren't issues tracked on GitHub since the code is already over > there? It's much more user-friendly than this blast from the past.

I have just recently decided that I'll migrate to GitHub issues. I'll do that with rest of the planned infrastructure migrations before the next release: web pages to otsaloma.io/gaupol, Gitter chat to replace mailing lists, releases to GitHub and translations to Transifex.
Comment 5 Osmo Salomaa 2016-05-08 04:32:47 UTC
I went ahead and replaced the Pango metrics with a value of 0.55 em per a-z character from your table for the calculation of the pixel width of em. That pixel width of one em is then used to calculate the em width of a piece of text.

With that change I get 9.48 em for "Lorem ipsum dolor." and 0.90 em for "M". I hope this is accurate enough. It serves the relative purpose just as before, but should now be at a more familiar level.

https://github.com/otsaloma/gaupol/commit/58422911e02efe1009a05af8406b43d211ab9d99
Comment 6 1nktr4p 2016-05-10 20:50:26 UTC
Well it's not ideal, but it's certainly better than it was.

I can't quite figure out what you did exactly. How do you get widths of individual characters from the alphabet average?
Comment 7 Osmo Salomaa 2016-05-10 21:08:50 UTC
(In reply to 1nktr4p from comment #6)
> I can't quite figure out what you did exactly. How do you get widths of
> individual characters from the alphabet average?

I set "abcdefghijklmnopqrstuvwxyz" as the text of a label. That's 26 characters, at the average 0.55 em/char (roughly DejaVu Sans or Verdana based on your table) it's 26*0.55=14.3 em. I divide the pixel width of the label by that to get the width of one em in pixels.

Then I set a different text in the label, get the pixel width and divide by the em width to get the length of the text in ems.

>>> from gi.repository import Gtk
>>> label = Gtk.Label(label="abcdefghijklmnopqrstuvwxyz")
>>> label.show()
>>> em = label.get_preferred_width()[1] / (0.55 * 26)
>>> em
12.237762237762237 # px
>>> label.set_text("Lorem ipsum dolor.")
>>> label.get_preferred_width()[1] / em
9.478857142857144 # em
>>> label.set_text("M")
>>> label.get_preferred_width()[1] / em
0.8988571428571429 # em
Comment 8 1nktr4p 2016-05-10 21:16:29 UTC
So you're able to query the line width in pixels, but not the font size in pixels? Am I understanding that right?
Comment 9 Osmo Salomaa 2016-05-10 21:54:23 UTC
(In reply to 1nktr4p from comment #8)
> So you're able to query the line width in pixels, but not the font size in
> pixels? Am I understanding that right?

I am reliably able to query the line width, or to be exact, the box width of a rendered label widget.

I thought I was able to also query the font size by introspecting the font description object. That was the previous approach, which did work in the past, but as you have reported, not anymore.

I don't know why getting the font size no longer works, my best guess is that GTK+'s move to CSS-based theming has changed what, how or when fonts are applied and comparing the label width against font description metrics is no longer comparing the same things.
Comment 10 1nktr4p 2016-05-10 22:28:15 UTC
OK, there's two more things we could try to get the precise measurement, and if that doesn't work, the current code will be good enough.

If you can print/display the label, post the a-z screenshot and its width (175 px?) and I'll ID the font and calculate em's actual pixel value.

If you can't, try measuring the em-space (U+2003, " ") or the em-dash (U+2014, "—") instead of the alphabet. They should both be exactly one em wide, but the tricky thing is that some fonts don't have the em-space, and in some fonts the em-dash is not a full em wide. So see if either of those characters give a value for "Lorem ipsum" or "M" that matches one of the values in the table.
Comment 11 Osmo Salomaa 2016-05-10 23:06:01 UTC
Some distros will have GTK+ default to their own branded theme, which means we can't assume that the default GTK+ font is the same for all users, which means that the metrics of the font I have in GTK+ doesn't have much value.

Measuring em-space in the user's GTK+ font could be useful, but I don't think I can tell if the font actually covers that as GTK+ or something lower in the stack probably silently falls back on another font or a regular space.

I really don't want to add a dependency for a particular font on account of this matter either. In my eyes, the robustness of the current solution outweighs its lack of preciseness.
Comment 12 1nktr4p 2016-05-11 00:07:14 UTC
Ah, got it. What about Windows and Mac? I assume for them GTK is packaged with Gaupol. So, does it come with its own font or does it use the system default?

Still, might be worth using em-dash since *all* fonts have it. I'll check its width in common sans-serif fonts and try to make the list as extensive as possible. If most of them are a full em wide, I think it would be a better solution.

Out of curiosity, what width gets reported for em dash?
Comment 13 Osmo Salomaa 2016-05-11 20:52:06 UTC
(In reply to 1nktr4p from comment #12)
> Ah, got it. What about Windows and Mac? I assume for them GTK is packaged
> with Gaupol. So, does it come with its own font or does it use the system
> default?

GTK+2 used to have a Windows theme, but GTK+3 is (or was when I last looked) still in poor shape on Windows. I don't remember what font it defaulted to, but I remeber I had to patch it to use Verdana since it was about the only font that didn't look awful. GTK+ is packaged with Gaupol, but what exactly the theme font is, I'll review again when I try packaging on Windows again.

I don't have a Mac, so I have never done packages for Mac.

> Out of curiosity, what width gets reported for em dash?

I checked. GTK+'s default font is Cantarell, but I've been using an older version of it, since I like the sharper look compared to newer versions. I have 0.0.16, the latest is 0.0.24. I tested both. In have the font size in GNOME set at 11 with 0.97 scaling.

With Cantarell 0.0.16 I get 9 pixels for the em dash, with 0.0.24 I get 13. I checked the fonts with FontForge, which states the em width for both as 1000. In 0.0.16 the em dash is 675 wide in 0.0.24 1000. This isn't really encouraging as both versions of Cantarell are still likely to be in use.

Here's at least one commit I found (2015-12-14): 

https://git.gnome.org/browse/cantarell-fonts/commit/?id=b17e3e5d096ae507d471f168d60cc41d08e05d82
Comment 14 1nktr4p 2016-05-15 03:25:06 UTC
Created attachment 327916 [details]
HTML test file
Comment 15 1nktr4p 2016-05-15 03:59:57 UTC
Well I checked the em dashes for the existing list (updated the attachment) and quite a few are less than one em, so no need to look at any other fonts.

When used for width calculation, those narrow ones give greater error than your solution, with values that are smaller than the actual size, so that's a definite no-go.

Given the circumstances, it seems that your fix really is the best compromise. Thanks for being so patient.

Are you planning to make a new release?
Comment 16 Osmo Salomaa 2016-05-15 13:06:23 UTC
Thanks for looking into this.

Note that it's possible to switch to a more accurate solution using the GTK+ CSS theming system. It's already technically possible, but just not yet such stable or reliable that I could actually use it, since if it breaks we could easily see values maybe up to 30% off. There's been some big changes again in 3.20 [1].

[1] https://blogs.gnome.org/mclasen/2015/11/20/a-gtk-update/

> Are you planning to make a new release?

I'm currently doing some bigger migrations to newer widgets and design practices and fixing accumulated bugs. The next release will be 1.0. I have no schedule, but maybe late June might be realistic. I'll also try to make a Windows installer, but no promises about the quality of it. It's likely to be better than that 0.24.3, but how much, I don't know.

If you want to keep an eye on things, the current development happens in the "howdoi" branch [2], which is likely to be intentionally broken at times before merging to master. I have also opened the issue tracker on GitHub [3], you can use that if you have more bug reports. I'll close this bugzilla once I deal with the remaining open bugs.

[2] https://github.com/otsaloma/gaupol/tree/howdoi
[3] https://github.com/otsaloma/gaupol/issues
Comment 17 1nktr4p 2016-05-15 18:19:28 UTC
Are there reasons to use installer for Windows, or could it be made portable?
Comment 18 Osmo Salomaa 2016-05-15 19:09:42 UTC
(In reply to 1nktr4p from comment #17)
> Are there reasons to use installer for Windows, or could it be made portable?

If by portable you mean that it can be run from a USB drive, this probably already works if you use the installer and just choose your USB drive instead of Program Files as the installation directory (I haven't tried it though). It's just not a single exe, but directory full of files.

If you mean saving config files on the USB drive instead of %APPDATA%, then currently no, but that would be fairly simple to patch into paths.py [1]. It might even already work if you use a custom start script that sets %APPDATA% before launching Gaupol.

[1] https://github.com/otsaloma/gaupol/blob/master/aeidon/paths.py

I expect the installer is what most people want and I don't expect put effort into figuring out how to make things portable, but if someone else wants to figure it out, I'll accept patches and I'll distribute a portable version if it's straight-forward for me to create one.

The right time to look into it would after I figure out how to bundle the dependencies into an installer, since I might switch technologies for that, which might also affect how to easiest create a portable version. Recent GTK+ apps on Windows seem to have used MSYS and WiX [2].

[2] https://git.gnome.org/browse/gedit/tree/win32/make-gedit-installer
Comment 19 1nktr4p 2016-05-15 21:31:39 UTC
OK. Thanks. Just one more thing and then you can finally close this report.

I noticed you changed the maximum line length in a recent commit to reflect the new calculations (https://github.com/otsaloma/gaupol/commit/3c25d7734ba40f4d3a03a6cb2b29798d1a0cd886). Was the original number based on some real-world data? Because, as far as I'm aware, the guideline mostly used by professionals and fansubbers is 40 characters per line (or 37 when they have to work with teletext), so restricting it to somewhere in the neighbourhood of 21–22 might be a better choice.
Comment 20 Osmo Salomaa 2016-05-15 21:53:30 UTC
(In reply to 1nktr4p from comment #19)
> I noticed you changed the maximum line length in a recent commit to reflect
> the new calculations. Was the original number based on
> some real-world data? Because, as far as I'm aware, the guideline mostly
> used by professionals and fansubbers is 40 characters per line (or 37 when
> they have to work with teletext), so restricting it to somewhere in the
> neighbourhood of 21–22 might be a better choice.

The original was not based on much else than my subjective observations and opinion. I'm aware of those 40 character rules, but I have suspected they are somewhat outdated -- written in times of fullscreen video, screens and outdated technical systems. Most modern commercial subs shipped with widescreen DVDs/Blu-rays tend to be a bit more, maybe around max. 43-46 characters.

What you see in the commit is the limits for the text correction assistant, which is mostly used to fix badly authored subs you happened to acquire somewhere. Since automatic line-breaking that the assistant does often gives bad results, I want to be conservative with the limit, so that most decent quality commercial subs go through unchanged.

There's currently no line length limit in Gaupol for subtitle authors. There's just line lengths visible, but no error display.
Comment 21 1nktr4p 2016-05-15 21:57:31 UTC
Got it.