After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 686612 - use Script_Extensions for view by-script
use Script_Extensions for view by-script
Status: RESOLVED OBSOLETE
Product: gucharmap
Classification: Core
Component: general
git master
Other Linux
: Normal enhancement
: ---
Assigned To: gucharmap maintainers
gucharmap maintainers
Depends on:
Blocks:
 
 
Reported: 2012-10-22 08:50 UTC by Yaron Shahrabani
Modified: 2021-06-02 09:31 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
The map showing the EXTENDED numerals (116.99 KB, image/png)
2012-10-22 08:50 UTC, Yaron Shahrabani
Details

Description Yaron Shahrabani 2012-10-22 08:50:40 UTC
Created attachment 226972 [details]
The map showing the EXTENDED numerals

Clicking the Arabic map on the left (Screenshot is RTL) will show you the available characters related to Arabic.

The only set of numerals appear on that map is the EXTENDED ARABIC-INDIC (Eastern Hindu Arabic) numerals, usually used for Persian, Sindhi and Urdu...

The actual ARABIC-INDIC (Hindu-Arabic) numerals are the second row in the following table:
http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Arabic_numerals-en.svg/500px-Arabic_numerals-en.svg.png

Text version of the top 3 lines:
Eastern               |0|1|2|3|4|5|6|7|8|9|
Arabic-Indic          |٠|‎١‎|٢‎|٣|٤‎|٥‎|٦‎|٧|٨‎|٩| (<- Lots of LRM there)
Eastern Arabic-Indic  ‎|۰|۱|۲|۳|۴|۵|۶|۷|۸|۹|

I looked into the gucharmap code and noticed that the ranges are fine and theoretically the characters should appear but in practice it doesn't...

Kind regards,
Yaron Shahrabani.
Comment 1 Yaron Shahrabani 2012-10-22 09:05:49 UTC
There's a mistake in the text table.
Please see below.

> The actual ARABIC-INDIC (Hindu-Arabic) numerals are the second row in the
> following table:
> http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Arabic_numerals-en.svg/500px-Arabic_numerals-en.svg.png
> 
> Text version of the top 3 lines:
> Eastern               |0|1|2|3|4|5|6|7|8|9|
> Arabic-Indic          |٠|‎١‎|٢‎|٣‎|٤‎|٥‎|٦‎|٧|‎٨‎|٩| (<- Lots of LRM there)
> Eastern Arabic-Indic  ‎|۰|۱|۲|۳|۴|۵|۶|۷|۸|۹|
Oops, not enough LRM apparently, fixed!
Comment 2 Christian Persch 2012-10-24 18:50:57 UTC
What code points do these 'missing' characters have?
Comment 3 Yaron Shahrabani 2012-10-24 21:51:01 UTC
U+0660 to U+0669

When sorting gucharmap by Unicode block they appear under Arabic but when sorting by Script it's not there...
Comment 4 Christian Persch 2012-10-24 21:53:36 UTC
0660..0669    ; Common # Nd  [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE

So that's expected, they're in the Common category, not exclusive to Arabic.
Comment 5 Yaron Shahrabani 2012-10-24 22:08:21 UTC
But they share the same Block and Category with the Extended Arabic-Indic numerals which appear on the Arabic map. They only have different BIDI definition according to fileformat.info.

For comparison:
http://www.fileformat.info/info/unicode/char/6f5/index.htm
http://www.fileformat.info/info/unicode/char/665/index.htm
Comment 6 Christian Persch 2012-10-24 22:14:23 UTC
No they don't:

06F0..06F9    ; Arabic # Nd  [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE

see http://www.unicode.org/Public/6.2.0/ucd/Scripts.txt .
Comment 7 Yaron Shahrabani 2012-10-24 22:20:52 UTC
Odd... the current situation is very confusing, people who see this table and are not familiar with Arabic that good thinks that the Arabic Indic numerals are the Eastern (Extended) Arabic numerals...
Comment 8 Khaled Hosny 2012-10-25 00:04:30 UTC
Though I agree it is confusing, Christian’s analysis is correct, it is just another example of the arbitrary behaviors and inconsistencies of Unicode, which is even more apparent in the Arabic block.
Comment 9 Behdad Esfahbod 2012-10-25 00:05:47 UTC
ChPe, can you implement ScriptExtensions.txt support?  That file supplements the Scripts.txt file for characters that are used in more than one script.
Comment 10 Behdad Esfahbod 2012-10-25 00:06:44 UTC
(In reply to comment #8)
> Though I agree it is confusing, Christian’s analysis is correct, it is just
> another example of the arbitrary behaviors and inconsistencies of Unicode,
> which is even more apparent in the Arabic block.

It's not.  Those are used in Arabic script and Thaana.  As such, they are marked Common script.  But the scripts they are used with are listed in ScriptExtentions:

# Script_Extensions=Arab Thaa

0660..0669    ; Arab Thaa # Nd  [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE
FDF2          ; Arab Thaa # Lo       ARABIC LIGATURE ALLAH ISOLATED FORM
FDFD          ; Arab Thaa # So       ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
Comment 11 Christian Persch 2012-10-25 00:14:05 UTC
Ok, I'll have a look at Script_Extensions.
Comment 12 GNOME Infrastructure Team 2021-06-02 09:31:11 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gucharmap/-/issues/426.