GNOME Bugzilla – Bug 741728
Group digits by 3 in context menu
Last modified: 2015-10-23 22:49:25 UTC
Recently I've been working with large files/directories a lot. When a utility such as "ls -l" tells me that a file is 21158807929 bytes large, it takes a relatively significant amount of time to figure out the magnitude (200 megs or 2 gigs or 20 gigs...). It would be cool if on mouse hover, similarly to URL highlighting, apostrophe-like pixels would be added grouping by 3.
Created attachment 292994 [details] [review] Low-hanging fruit implementation This is a demo (patch for gnome-terminal), quickly implemented on top of the existing underlining infrastructure. The proper implementation would of course need to go to vte rather than g-t. We'd need some API similarly to dingu to make this configurable.
Afaik groups of 3 aren't universal; some locales use groups of 4.
Yup, and some use 2 or a mix of 2 and 3: http://en.wikipedia.org/wiki/Decimal_mark and glibc's localedata: the output of `locale grouping`, as well as grepping for "grouping" in /usr/share/i18n/locales. Also, the symbols used vary a lot. There's absolutely no chance to squeeze in kanjis. Squeezing in spaces: would require to push the numbers a bit towards each other on hover, which I guess would be way too annoying. I wouldn't want to show a dot-like or comma-like symbol either as that usually interferes with the decimal separator in most of the locales. So I'd go for something hardcoded which resembles the apostrophe or an upper dot. At this point we've already taken a huge step away from proper localization. I guess it's a bit rude to say that I don't care about other groupings too much. Anyway, using glibc's values according to g-t's locale is probably not much harder to implement than hardwired 3. It would be interesting to know that, when folks of these nations think in kilo/mega/giga/terabytes, do they prefer grouping by 3 or grouping according to their traditions. (Of course this feature should be useful for any numbers, not just ones having some computer-related meaning.)
Created attachment 293007 [details] [review] Quick demo hack Here's a quick live demo hack, it shows the behavior I had in mind. Not sure if it's more useful than annoying :)
After using it for a while: It's really useful, yet a tiny little bit annoying. I think it should only be activated if Shift or Ctrl (which?) is held down.
I was heading in the other direction. Can we somehow make that visible with no mouse-over?
I deliberately wouldn't want to do that. It causes visual clutter (e.g. it's too much noise for 4- or 5-digit numbers), might ruin the look of some apps, and there are tons of faulty groupings, e.g. for dates like 20150109. Anyway, proof of concept implementations, photoshopped screenshots etc. of what you have in mind are welcome :)
I really love this feature, it's a big productivity boost for me while working with gigabytes of data... however, it's unclear how the API and internationalization should look like, and of course it's unclear how users will perceive it. How about the following? I add this feature to vte, hardcoded, no API, but behind a configure flag or an environment variable. It would be enabled by default in the development series (this way the feature reaches Rawhide users and we'll see their feedback), but disabled in stable. The default could even depend on vte's version number's parity, so no last-minute code change is necessary. This way we get feedback from Rawhide users. As a rule of thumb people are way more louder if something's wrong than when something's right. So when we enable the feature, people who find it distracting will speak up. Then when we disable those who found it useful will tell us. With this live experiment we can hopefully gather feedback, fine tune the behavior, or even drop it, without ever reaching the stable branch by default.
The iTerm2 counterpart of the same request is here: https://gitlab.com/gnachman/iterm2/issues/18140
Another possible approach, inspired by iTerm2: In iTerm2, if you right-click on a number, the context menu will have an inactive entry converting the number to hex (or from hex to dec), e.g. 12345678 = 0xbc614e We could implement this and extend by adding separators and also dividing by powers of 1024, e.g.: 12'345'678 = 0xbc614e ≈ 11.77 Mi This would work similarly to the context-specific Open Link and Copy Link Address entries, except that, as in iTerm2, it would be inactive. This approach is less intrusive for users who don't need this feature, and is easier to implement. Most of the implementation goes to gnome-terminal. The regexp would catch the complete number, and printf could format it with grouping in a locale-specific way. The only extension to vte would be a new vte_terminal_match_set_is_underlined() in addition to the current vte_terminal_match_set_cursor_{type,name}() since I probably wouldn't want the numbers to be underlined on mouseover.
Sorry, the correct link to the iTerm2 issue is: https://gitlab.com/gnachman/iterm2/issues/1814
Sounds interesting.
I think this will be even easier to do once we use gmenu (bug 745329).
Created attachment 310218 [details] [review] Add info in the menu Here's a nice patch for adding the info in the right-click menu as outlined in a previous comment, e.g. 12'345'678 = 0xbc'614e ≈ 11.77 Mi Features: - handles decimal and hex numbers up to 2^64-1 - doesn't overflow; instead shows nothing for too large numbers - groups decimal digits by 3, hex digits by 4 using locale-specific separator - shows magnitude with powers of 1024, e.g. = 10.25 Mi or ≈ 1023.99 Gi or similar - always rounds downwards (is it better than rounding to the nearest, which is printf's default for floats?) - uses the = symbol instead of ≈ if and only if it's really the exact value Issues that need to be addressed: - don't show error message on ctrl+rightclick, rather start a rectangular selection or send it to the app as appropriate - don't underline on hover - code cleanup :) Long term TODOs: - handle arbitrarily large numbers - do we want to handle octal? - manually split in 3 if locale-specific thousands separator is empty (such as in C.UTF-8 on Ubuntu)
Created attachment 310247 [details] Screenshot, just because it's so cool :)
Created attachment 310248 [details] [review] Add info in the menu, v2 Right-click approach, 2nd version. Issues addressed: - group 3 by even if there's no separator defined in the locale (yes there are such locales) - ctrl+click does its regular action - doesn't change to hand cursor on mouseover - some code cleanup
@chpe What's your overall impression with this approach (the right-click menu)? Would you be happy adding this feature to the 3.19 series? The only functional change I'm still planning is not to underline numbers on hover (this requires a new vte method). This would make the feature less discoverable (but people would discover it sooner or later after a right-click which accidentally happens to be over a number), but in turn would make the UI less cluttered and prevent tons of false positive underlines (when it's not actually a number that makes sense on its own, e.g. in a base64 string. I don't mind such false positives showing up in the right-click menu, but straight away on the UI on hover is IMO unpleasant).
I'm fine with adding this for 3.19 :-)
Bug 754949 is not strictly speaking a dependency, but without that (especially if we turn off underlining) it would be quite annoying that sometimes we just don't recognize a number. So let's fix that as a prerequisite.
Shall we swap the first two menu entries (see screenshot in comment 15)? Or move the number grouping line even further down? I guess it's kinda lame that the first entry is inactive.
Should change FALSE to TRUE in the gtk_menu_shell_select_first() call, that takes care of the keyboard activation. For mouse activation, I'd say it doesn't matter in this specific case since the first item ("Open terminal") isn't really a 'context' action anyway.
Created attachment 312586 [details] [review] vte: allow not to underline a regex match This is a bit ugly due to the confusion about a similarly named but different variable in bug 755997.
Created attachment 312588 [details] [review] Add info in the menu, v3 Changes: - do not underline numbers on hover (requires new vte feature) - don't populate the menu on numbers less than 10 - update to current git Skipped for now: - i18n of "Ki" and friends, wiki article translations suggest that it probably won't be needed. We can add later if people complain. - Focusing the first menu entry. It's not focused right now either, only when you open it via keyboard, in which case dingus are not recognized. Could you please review the patches?
Just wondering... now that I changed the code not to do anything on hover... Is it really the best approach to build on top of the dingu engine? Or shall we just query the terminal contents near the cursor when a right click occurs, to see if there's a number there?
I don't like the 'invisible dingu' thing :-) I was thinking more about an API like vte_terminal_match_check_event() which takes extra regexes and returns the matches. Sth you'd use like this: Regex extra_regexes[3] = { my_regex1, my_regex2, my_regex3 }; char *extra_matches[3]; char *match; int tag; /* returns TRUE if there's any matches, in dingus or extra_regexes */ if (vte_terminal_match_check_event_fuller(terminal, event, &tag, &match, &extra_regexes, &extra_matches)) { /* handle dingu matches */ /* handle extra regex matches */ } where extra_matches[i] would be set to the match for extra_regexes[i] if there is one, or NULL otherwise.
/me takes a deep breath... Any chance you feel like implementing this for me? :) You seem to be quite familiar with these parts, especially with the brand new pcre2 stuff, while it's still a bit of a maze for me... I can do it of course, but probably not in the next couple of weeks.
Sure, I'm going to do that. Was just asking if you think that's the API you want :-)
Yup, absolutely! :) Thanks!
The current vte patch is buggy if you have an underlined (url) and a non-underlined (number) dingu above each other and you move the mouse vertically. One more reason to drop this approach, as we've already agreed :)
Created attachment 312650 [details] [review] Add info in the menu, v4 Basically going back to v2, even dropping the bits that won't be used from there (not changing the hover cursor). This is from where I'll move forward when the extra match feature is available.
Done on vte master; use vte_terminal_event_check_[g]regex_simple(). Not really satisfied with that API nor its name, but it works for this.
(In reply to Christian Persch from comment #31) > Not really satisfied with that API match_flags should also be an array, maybe different regexs require different flags. Or... since you decoupled it from vte_terminal_match_check(), actually the new method could just operate on a single regex, and let the caller call it in a loop if it wants to. That'd simplify the API and the vte implementation; and perhaps the caller as well (no need to fiddle with arrays). > nor its name I guess you deliberately didn't go for vte_terminal_match*(), right? Okay then.
I didn't make match_flags an array since we always pass 0 anyway, and this is the 'simple' version of the API :-) Do you have a use case for passing per-regex flags? Passing a bunch of regexes and doing the loop in vte is cheaper since it only needs to calculate the offset once. And if you do just have 1 regex, you can just call it like check(terminal, event, ®ex, 1, 0 /*flags*/, &match). As for the name, yes, I didn't want to confuse this with the dingu match API.
> I didn't make match_flags an array since we always pass 0 anyway I had CASELESS in mind; just realized that's a compile flag and not a match flag :)
Created attachment 312927 [details] [review] Add info in the menu, v5 First version on top of the new vte API. Too much code duplication (handling of two sets of regexs) -> I'm not sure what we could do to prevent this. Maybe factor out a helper method could help a bit (but variable declarations would still be doubled). Or changing the vte API to take one regex at a time, in that case all regexs could reside in a single array. The overall design that terminal_screen_check_match() returns at most 1 match (from both sets in total) isn't nice either. It should propagate all other matches (perhaps one per flavor kind) via dedicated variables rather than info->string, and so we could even match a URL and a number at the same time (and add both set of menu entries). What do you think?
Created attachment 313012 [details] [review] Add info in the menu, v6 Factored out the biggest code duplication so a standalone precompile_regexes() method. This will also be needed for unittesting the regex strigs (bug 756038). Review please :)
Created attachment 313013 [details] [review] Add info in the menu, v7 (Some minor cosmetical changes only)
Review of attachment 313013 [details] [review]: The refactoring of precompiling regexes is fine modulo the one nit, you can commit that separately now. For the rest, I think you shouldn't store the result of the extra regex match in the place of the dingu match, but in addition to it, in PopupInfo. A few more comments below: ::: src/terminal-screen.c @@ +604,2 @@ n_url_regexes = G_N_ELEMENTS (url_regex_patterns); + precompile_regexes (url_regex_patterns, n_url_regexes, &url_regexes, &url_regex_flavors); I'd just pass G_N_ELEMENTS here, no need for the extra variable holding it since it's only used once. @@ +1934,3 @@ g_free (match); + + matches = g_malloc (n_extra_regexes * sizeof (char *)); g_new(char *,n_extra_regexes) or even g_newa since it's just a few items. @@ +1961,3 @@ + } + } + I don't think this is the right place. match_check is for the dingu matches, you should make this an extra function, and store the result in the PopupInfo alongside the match info? And then you don't need the changes to open_url either. ::: src/terminal-util.c @@ +305,3 @@ uri = g_strdup (orig_url); break; + case FLAVOR_NUMBER: Actually FLAVOR_NUMBER shouldn't occur here, so just remove the break; which makes this g_assert_not_reached too. @@ +1013,3 @@ + + errno = 0; + if (str[1] == 'x' || str[1] == 'X') { Check also str[0] == '0' ? @@ +1017,3 @@ + hex = TRUE; + } else { + num = strtoull(str, NULL, 10); Maybe also support str[0] == '0' and str[1] in 0..7 and use base 8 for the conversion then? @@ +1019,3 @@ + num = strtoull(str, NULL, 10); + } + if (errno) { Actually all the above if () blocks can just be replaced with g_ascii_strtoull(str, NULL, 0) since strtoull will use base 16 if the string starts with 0[xX], base 8 if it starts with 0[0-7] and base 10 otherwise. @@ +1035,3 @@ + decstr = g_strdup_printf("%'llu", num); + } else { + /* If, however, thousep is empty, override it with a space so that we Do we really need this fallback for broken locales? It seems most glibc locales have this set up correctly. @@ +1048,3 @@ + + /* Find out the human-readable magnitude, e.g. 15.99 Mi */ + if (num >= 1024) { Just use g_format_size[_full] ?
(In reply to Christian Persch from comment #38) > Review of attachment 313013 [details] [review] [review]: > ::: src/terminal-screen.c > @@ +604,2 @@ > n_url_regexes = G_N_ELEMENTS (url_regex_patterns); > + precompile_regexes (url_regex_patterns, n_url_regexes, &url_regexes, > &url_regex_flavors); > > I'd just pass G_N_ELEMENTS here, no need for the extra variable holding it > since it's only used once. It's a global variable that we initialize here and another method uses it too. Although that one could also use G_N_ELEMENTS. (I generally don't like this design of separate variables url_regexes, url_regex_flavors and n_url_regexes (and the same for s/url/extra from now on). regex_flavors are copied for no particular use, they could be looked up straight from url_regex_patterns. Or if we wish to have a design where we no longer look at the source then maybe we could define a structure encapsulating the compiled regex and the flavor, and put then in a GArray so that n_ is not needed either. But I really didn't feel like refactoring :-)) > @@ +1934,3 @@ > g_free (match); > + > + matches = g_malloc (n_extra_regexes * sizeof (char *)); > > g_new(char *,n_extra_regexes) or even g_newa since it's just a few items. Okay. > @@ +1961,3 @@ > + } > + } > + > > I don't think this is the right place. match_check is for the dingu matches, > you should make this an extra function, and store the result in the > PopupInfo alongside the match info? And then you don't need the changes to > open_url either. Yup, will do. > ::: src/terminal-util.c > @@ +305,3 @@ > uri = g_strdup (orig_url); > break; > + case FLAVOR_NUMBER: > > Actually FLAVOR_NUMBER shouldn't occur here, so just remove the break; which > makes this g_assert_not_reached too. Same. > @@ +1013,3 @@ > + > + errno = 0; > + if (str[1] == 'x' || str[1] == 'X') { > > Check also str[0] == '0' ? Regex matching guarantees this. > @@ +1017,3 @@ > + hex = TRUE; > + } else { > + num = strtoull(str, NULL, 10); > > Maybe also support str[0] == '0' and str[1] in 0..7 and use base 8 for the > conversion then? I deliberately decided not to handle octal. I think it's rarely used and is confusing. Recognizing zero-padded decimals as decimals is IMO more important. Going for heuristics based on the following digits (e.g. 077 is octal, 089 is decimal) sounds very bad to me. Also, if we recognize octal then for feature parity I think we should also convert dec/hex numbers to octal in the popup, making it way too long. I'm open to revise this though if you really think octal is useful. Maybe doing octal conversions only for numbers up to 0377? I was also wondering about recognizing U+ prefix for hex, unix timestamps and convert them to localtime and UTC... This is really where g-t could launch an external script that the user writes... if Gnome wasn't the environment that's heading the opposite direction :-) > @@ +1019,3 @@ > + num = strtoull(str, NULL, 10); > + } > + if (errno) { > > Actually all the above if () blocks can just be replaced with > g_ascii_strtoull(str, NULL, 0) since strtoull will use base 16 if the string > starts with 0[xX], base 8 if it starts with 0[0-7] and base 10 otherwise. I went with the above code exactly because I didn't want to handle octal. > @@ +1035,3 @@ > + decstr = g_strdup_printf("%'llu", num); > + } else { > + /* If, however, thousep is empty, override it with a space so that we > > Do we really need this fallback for broken locales? It seems most glibc > locales have this set up correctly. These have an empty separator (glibc 2.21): aa_DJ ar_SA bg_BG bs_BA ca_ES ce_RU eo es_CU eu_ES hr_HR ia mg_MG nl_NL pap_AN pap_AN pl_PL POSIX pt_PT rw_RW sr_RS ti_ER It's just a few lines of IMHO clean source code, so I'd like to keep it. > @@ +1048,3 @@ > + > + /* Find out the human-readable magnitude, e.g. 15.99 Mi */ > + if (num >= 1024) { > > Just use g_format_size[_full] ? I didn't know about this method. While there's definitely an advantage in going with stock methods rather than reimplementing the wheel, I'd make an exception here for the following weak reasons: - We wouldn't be able to properly use "=" vs. "≈" which I find really useful. - I prefer rounding downwards rather than rounding to the nearest, since that gives you the additional piece of info whether you're just below a power of two or just above (see the screenshot). - I decided to omit the letter "B" (e.g. show "Ki" instead of "KiB") because we don't know if the number represents bytes or not. I'm really not sure if this is correct or is the right approach, since probably you wouldn't be interested in kibi-mebi-conversion for any other unit. - I prefer 2 decimals to 1, for no particular reason. - This is how iTerm behaves (well, that's because George took my code from this patch :))
> - I decided to omit the letter "B" (e.g. show "Ki" instead of "KiB") because > we don't know if the number represents bytes or not. I'm really not sure if > this is correct or is the right approach, since probably you wouldn't be > interested in kibi-mebi-conversion for any other unit. <pedantic> Also, adding to this: Displaying "1024 = 1 Ki" is mathematically correct, and so would be "1024 B = 1 KiB" (but then how to display the hex value?), whereas "1024 = 1 KiB" would be incorrect as the dimensions don't match. (If it was bits instead of bytes, one could argue that 1 bit is the natural unit of information and it's okay to be sloppy and use "1" and "1 bit" interchangeably, but I don't think it's okay with something that consists of an arbitrary number of (that happens to be 8) bits. Byte is definitely a unit that you can't omit from only one side of an equation. It's probably okay to display something like "1024 (1 KiB)" because this is not an equation, but then there's no chance of displaying "=" or "≈" as appropriate. Maybe then approximation could be denoted by "~", such as "1030 (~1 KiB)" but this doesn't emphasize an exact match (which is pretty much the point of the "="/"≈" feature.) </pedantic>
Alright, I'm fine with not handling octal. And good points, esp. about the missing 'B'.
(In reply to Egmont Koblinger from comment #35) > The overall design that terminal_screen_check_match() returns at most 1 > match (from both sets in total) isn't nice either. It should propagate all > other matches (perhaps one per flavor kind) via dedicated variables rather > than info->string, and so we could even match a URL and a number at the same > time (and add both set of menu entries). It's certainly not optimal that dingue matching stops at the first matching regex. You can see the problem e.g. with the vte test app where depending where you enter the match, the first regex matches a shorter string than the 2nd regex. For underlining purposes, we do need to decide which match to underline for... maybe just pick the longest match? And yes, should still return all the other dingu matches via some new API.
Created attachment 313781 [details] [review] Add info in the menu, v8 I hope that architecturally this is what you had in mind :) I've already submitted factoring out precompile_regexes(), this one goes on top of that. It's really not easy to decide when to use generic variable names such as "match", "string", "flavor" etc., and when to switch to ones that give semantics like "url", "url_flavor", "number_info". I made some changes here: terminal_screen_check_extra() is the one that based on the flavor decides what to do with the match (the only option being number_info right now but it's extendible), hence from this point onwards I think a semantic name is preferable. I made similar changes to the already existing info->string and info->flavor (now they're info->url and info->url_flavor). The variable name "string" is especially confusing and not desirable as soon as you have multiple strings. Nitpicking about the preferred variables or method names is welcome. It's also a great opportunity to unify on American vs. British spelling of flavo(u)r. (I vote for American purely because that's the standard language of informatics. I'll split to two changes once everything's settled: the first one renames variables only, and the second one adds the new functionality.)
Review of attachment 313781 [details] [review]: Looks good! Just a few nits below; ok to commit with those fixed, thanks :-) ::: src/terminal-screen.c @@ +193,3 @@ +static const TerminalRegexPattern extra_regex_patterns[] = { + { "0[Xx][\\dA-Fa-f]+", FLAVOR_NUMBER, FALSE }, Simpler: "0[xX][[:xdigit:]]+" @@ +194,3 @@ +static const TerminalRegexPattern extra_regex_patterns[] = { + { "0[Xx][\\dA-Fa-f]+", FLAVOR_NUMBER, FALSE }, + { "\\d+", FLAVOR_NUMBER, FALSE }, Can't we just combine these 2 regexes into 1? ::: src/terminal-util.c @@ +46,3 @@ #include "terminal-libgsystem.h" +static const char* iec_prefixes[] = { "Ki", "Mi", "Gi", "Ti", "Pi", "Ei" }; Even better to use |static const char iec_prefixes[] = "KMGTPE";| and below use %ci instead of %s for it in the printf. @@ +995,3 @@ + * @str: a dec or hex number as string + * + * Returns: Useful info about @str, or %NULL if it's too large (transfer full) annotation, just for completeness. Maybe change the name of the func to include "dup" or "printf" to make it clear it returns allocated memory?
(In reply to Christian Persch from comment #44) > Even better to use |static const char iec_prefixes[] = "KMGTPE";| and below > use %ci instead of %s for it in the printf. If we make this step away from localization (which we probably won't need anyways) and go for the concise solution, I guess I'll just drop this static variable and inline it: printf(..., "KMGTPE"[pow]) What about s/flavour/flavor, ok to go?
I like the correct spelling better, but ok whatever you decide :-)
> I like the correct spelling better LOL. You mean traditional? http://9gag.com/gag/aDmPrnO/shots-fired Well, we have color throughout the code and not colour :P
Created attachment 313931 [details] [review] Add info in the menu, v9 part 1
Created attachment 313932 [details] [review] Add info in the menu, v9 part 2
v9 submitted.