GNOME Bugzilla – Bug 686612
use Script_Extensions for view by-script
Last modified: 2021-06-02 09:31:11 UTC
Created attachment 226972 [details] The map showing the EXTENDED numerals Clicking the Arabic map on the left (Screenshot is RTL) will show you the available characters related to Arabic. The only set of numerals appear on that map is the EXTENDED ARABIC-INDIC (Eastern Hindu Arabic) numerals, usually used for Persian, Sindhi and Urdu... The actual ARABIC-INDIC (Hindu-Arabic) numerals are the second row in the following table: http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Arabic_numerals-en.svg/500px-Arabic_numerals-en.svg.png Text version of the top 3 lines: Eastern |0|1|2|3|4|5|6|7|8|9| Arabic-Indic |٠|١|٢|٣|٤|٥|٦|٧|٨|٩| (<- Lots of LRM there) Eastern Arabic-Indic |۰|۱|۲|۳|۴|۵|۶|۷|۸|۹| I looked into the gucharmap code and noticed that the ranges are fine and theoretically the characters should appear but in practice it doesn't... Kind regards, Yaron Shahrabani.
There's a mistake in the text table. Please see below. > The actual ARABIC-INDIC (Hindu-Arabic) numerals are the second row in the > following table: > http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Arabic_numerals-en.svg/500px-Arabic_numerals-en.svg.png > > Text version of the top 3 lines: > Eastern |0|1|2|3|4|5|6|7|8|9| > Arabic-Indic |٠|١|٢|٣|٤|٥|٦|٧|٨|٩| (<- Lots of LRM there) > Eastern Arabic-Indic |۰|۱|۲|۳|۴|۵|۶|۷|۸|۹| Oops, not enough LRM apparently, fixed!
What code points do these 'missing' characters have?
U+0660 to U+0669 When sorting gucharmap by Unicode block they appear under Arabic but when sorting by Script it's not there...
0660..0669 ; Common # Nd [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE So that's expected, they're in the Common category, not exclusive to Arabic.
But they share the same Block and Category with the Extended Arabic-Indic numerals which appear on the Arabic map. They only have different BIDI definition according to fileformat.info. For comparison: http://www.fileformat.info/info/unicode/char/6f5/index.htm http://www.fileformat.info/info/unicode/char/665/index.htm
No they don't: 06F0..06F9 ; Arabic # Nd [10] EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED ARABIC-INDIC DIGIT NINE see http://www.unicode.org/Public/6.2.0/ucd/Scripts.txt .
Odd... the current situation is very confusing, people who see this table and are not familiar with Arabic that good thinks that the Arabic Indic numerals are the Eastern (Extended) Arabic numerals...
Though I agree it is confusing, Christian’s analysis is correct, it is just another example of the arbitrary behaviors and inconsistencies of Unicode, which is even more apparent in the Arabic block.
ChPe, can you implement ScriptExtensions.txt support? That file supplements the Scripts.txt file for characters that are used in more than one script.
(In reply to comment #8) > Though I agree it is confusing, Christian’s analysis is correct, it is just > another example of the arbitrary behaviors and inconsistencies of Unicode, > which is even more apparent in the Arabic block. It's not. Those are used in Arabic script and Thaana. As such, they are marked Common script. But the scripts they are used with are listed in ScriptExtentions: # Script_Extensions=Arab Thaa 0660..0669 ; Arab Thaa # Nd [10] ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT NINE FDF2 ; Arab Thaa # Lo ARABIC LIGATURE ALLAH ISOLATED FORM FDFD ; Arab Thaa # So ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
Ok, I'll have a look at Script_Extensions.
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gucharmap/-/issues/426.