Bug 431531 – Add support for user-configurable character naming

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 431531 - Add support for user-configurable character naming


Summary:	Add support for user-configurable character naming


Status:	RESOLVED OBSOLETE

Product:	orca
Classification:	Applications
Component:	speech
Version:	2.19.x
Hardware:	Other All

Importance:	Normal enhancement
Target Milestone:	FUTURE
Assigned To:	Orca Maintainers
QA Contact:	Orca Maintainers

URL:
Whiteboard:	3.0!

Duplicates:	520601 (view as bug list)
Depends on:
Blocks:	Andalucia

Reported:	2007-04-20 00:50 UTC by Joanmarie Diggs (IRC: joanie)
Modified:	2018-02-08 12:57 UTC

See Also:
GNOME target:	---
GNOME version:	2.17/2.18

Attachments
Patch to implement reviewing of unicode character information when pessing three times the numpad 2 or numpad down keys. (6.03 KB, patch) 2010-01-19 20:03 UTC, Rui Batista	committed	Details \| Review

Description Joanmarie Diggs (IRC: joanie) 2007-04-20 00:50:07 UTC

Michael Whapples said on the Orca list:

> [...] On a similar note, there are some symbols not known
> by orca, so these could do with being added, but as some of them may be
> country specific, or users may have there own preference for a name for
> it (eg. ¬ and £ are not spoken by orca (if I have all my character sets
> done correct on this ubuntu machine, the second of those examples is the
> pounds sign)), so a user configuration system for the naming of symbols
> might be good.

Comment 1 Joanmarie Diggs (IRC: joanie) 2007-04-25 17:11:00 UTC

I've had some additional thoughts on this front:

1. If a user comes across a unicode character that isn't spoken and cannot see what's on the screen to see what it is, how can the user identify it and add it?  We need some way (keystroke, addition to whereAmI, whatever) to communicate the unicode character number to the user.  The user could then look up the specified character in a table (e.g. at unicode.org) and add the character along with the desired name to his/her personal dictionary.

2. It might be cool to have a command which would add the current character to the dictionary.  You'd arrow to the character in question, press the magic keystroke, and a dialog box would come up providing you with an entry in which you could type the character's name and a control in which you could specify the level of punctuation at which it should be spoken.

3. I plan to start generating Orca-ized tables of unicode characters which can be downloaded by end users.  The way things currently are, the user would then copy and paste the contents into ~/.orca/orca-customizations.py.  Is that what we want, or should we have chnames.py look in a  given directory for additional dictionary files?

Thoughts?

Comment 2 Michael Whapples 2007-04-25 21:47:20 UTC

I think that if orca does not know the character, it should speak the unicode number value (obviously this should be configurable), then the user can use this to fill in a table in the orca preferences dialog. It would be nice to have a key stroke to be able to add the current character, but this would be purely extra to improve the usability and I would not like it as a primary/only way to add unicode characters.

I would have thought that user dictionaries would be better than customizations.py (or any python script) as a dictionary would remove all extra syntax which python requires. This hopefully would make the dictionaries smaller. Also if the dictionary file is convenient for hand editing then this means if a user has many characters to add it could be written in a text editor rather than needing to use the GUI of orca. eg. if the file needs unicode character, punctuation level and replacement text, could the file be simply a three column (tab spaced) format such as 
\u22c5    none    Dot product
(NOTE: I used spaces above rather than tabs).
 Would the above be too limiting?
(In reply to comment #1)
> I've had some additional thoughts on this front:
> 
> 1. If a user comes across a unicode character that isn't spoken and cannot see
> what's on the screen to see what it is, how can the user identify it and add
> it?  We need some way (keystroke, addition to whereAmI, whatever) to
> communicate the unicode character number to the user.  The user could then look
> up the specified character in a table (e.g. at unicode.org) and add the
> character along with the desired name to his/her personal dictionary.
> 
> 2. It might be cool to have a command which would add the current character to
> the dictionary.  You'd arrow to the character in question, press the magic
> keystroke, and a dialog box would come up providing you with an entry in which
> you could type the character's name and a control in which you could specify
> the level of punctuation at which it should be spoken.
> 
> 3. I plan to start generating Orca-ized tables of unicode characters which can
> be downloaded by end users.  The way things currently are, the user would then
> copy and paste the contents into ~/.orca/orca-customizations.py.  Is that what
> we want, or should we have chnames.py look in a  given directory for additional
> dictionary files?
> 
> Thoughts?
>

Comment 3 Joanmarie Diggs (IRC: joanie) 2008-02-20 15:50:30 UTC

As commented on the Orca list:
http://mail.gnome.org/archives/orca-list/2008-February/msg00251.html

--------------
I only found a couple of special characters that I encounter on a speratic basis that orca didn't already have a entry for.  But what about (in the very unlikely event) that I run into a special character in a doccument that isn't automatically recognized by orca?  I noticed that there is no speech output for unknown characters.  Is there a way to obtain the unicode value of unidentified characters like there is in JAWS?

Comment 4 Willie Walker 2008-03-05 22:34:08 UTC

*** Bug 520601 has been marked as a duplicate of this bug. ***

Comment 5 Joanmarie Diggs (IRC: joanie) 2008-03-05 22:40:53 UTC

(In reply to comment #4)
> *** Bug 520601 has been marked as a duplicate of this bug. ***

But we don't want to lose the answer to the problem Will provided on bug 520601. :-)

> To me it seems this could be done in the getCharacterName() function.  instead
> of returning character after the try block, it could return something like
> "character u+#### hex".  So the user could at least look up the unicode table
> and find out what the character is supposed to be."
>

Comment 6 Aleksey Sadovoy 2009-03-18 09:18:30 UTC

can this target also known to orca characters? renaming character labels will be usefull to russian users, as we have really very long character names by default. listening to they is quite nasty.

Comment 7 Willie Walker 2009-03-29 21:06:58 UTC

(In reply to comment #6)
> can this target also known to orca characters? renaming character labels will
> be usefull to russian users, as we have really very long character names by
> default. listening to they is quite nasty.

I'm not sure what question you are asking by "can this target also known to orca characters?"  Is it that you'd rather hear hex codes and not character names?

Alternatively, will the following solution work for you:

> To me it seems this could be done in the getCharacterName() function.  instead
> of returning character after the try block, it could return something like
> "character u+#### hex".  So the user could at least look up the unicode table
> and find out what the character is supposed to be."

If not, do you have specific examples of things that won't work and what you'd like to see done to resolve the problem?

Comment 8 Aleksey Sadovoy 2009-03-30 12:07:19 UTC

>I'm not sure what question you are asking by "can this target also known to
>orca characters?"  Is it that you'd rather hear hex codes and not character
>names?
Sorry - there was mistyping.
I mean, that redefining character labels to already known (to orca) characters might be usefull especeally for russian users. In russian, we have very long character names (4 syllables and more), and most of us will redefine their to some sort of shorter ones.

Comment 9 Willie Walker 2009-03-30 12:37:35 UTC

(In reply to comment #8)
> >I'm not sure what question you are asking by "can this target also known to
> >orca characters?"  Is it that you'd rather hear hex codes and not character
> >names?
> Sorry - there was mistyping.
> I mean, that redefining character labels to already known (to orca) characters
> might be usefull especeally for russian users. In russian, we have very long
> character names (4 syllables and more), and most of us will redefine their to
> some sort of shorter ones.

Thanks!  I'd assume we would provide a table very similar to the one on the pronunciations page for defining words.  So, the answer is that you should be able to redefine them.

Right now, you can define/redefine them, but it's a little more work.  See the FAQ section on Customizing Orca at http://live.gnome.org/Orca/FrequentlyAskedQuestions/CustomizingOrca.

Comment 10 Willie Walker 2009-05-10 00:37:46 UTC

(In reply to comment #2)
> I think that if orca does not know the character, it should speak the unicode
> number value (obviously this should be configurable), then the user can use
> this to fill in a table in the orca preferences dialog.

We should try to do this as a minimum for 2.28.

Comment 11 Willie Walker 2009-08-14 15:43:52 UTC

We are late in the 2.28 release cycle and I want to focus on "high impact"/"low risk" items that also fall within the release team's restrictions in place.  Regretfully, this bug doesn't fit well within those constraints and we'll review it for the 2.29 release cycle.

Comment 12 Rui Batista 2010-01-19 17:07:20 UTC

Hi,

At least I think we should provide a quick way to get the unicode value for a character. I propose using three times the numpad 2 key, two times is the phonetic spelling stuff and one the simple review char.

For speaking and brailling the character value I propose the standard representation of four hex dijits but maby we should provide the base10 one depending on people preferences or configuration. The following code can return that representation:

def getCharacterUnicodeValueRepresentation(character):
    """ Gets a string representation for the character value.
    For now it returns a four dijit hex number with leading zeros when needed
    
    arguments:
    - character: the character do get the name for
    
    returns a string with the character unicode value
    """
    if not isinstance(character, unicode):
        character = character.decode('UTF-8')
    return "%04x" % ord(character)

There is also the unicodedata module where we can find the name for any character (it seems only in english), maby providing that information can be useful too.

What do you all think?

Comment 13 Rui Batista 2010-01-19 20:03:58 UTC

Created attachment 151784 [details] [review]
Patch to implement reviewing of unicode character information when pessing three times the numpad 2 or numpad down keys.

This is the patch to implement the functionality described in my previous comment.

Comment 14 Willie Walker 2010-01-19 22:49:22 UTC

Thanks!  I'm curious about this:

+        vars = {"value" : self.getUnicodeValueRepresentation(character)}
+        # Translators: this is information about an unicode character reported to the user.
+        # value is the unicode number value of this character in hex.
+        speech.speak(_("Character %(value)s" % vars))

Why is the complexity of %(value)s used? I'm never sure the l10n folks get these things right, so it seems like this might be less error prone for translators:

+        # Translators: this is information about an unicode character reported to the user.
+        # value is the unicode number value of this character in hex.
+        speech.speak(_("Character %s") \
+                     % self.getUnicodeValueRepresentation(character))

Comment 15 Rui Batista 2010-01-20 00:33:28 UTC

(In reply to comment #14)
> Thanks!  I'm curious about this:
> 
> +        vars = {"value" : self.getUnicodeValueRepresentation(character)}
> +        # Translators: this is information about an unicode character reported
> to the user.
> +        # value is the unicode number value of this character in hex.
> +        speech.speak(_("Character %(value)s" % vars))
> 
> Why is the complexity of %(value)s used? I'm never sure the l10n folks get
> these things right, so it seems like this might be less error prone for
> translators:
> 
> +        # Translators: this is information about an unicode character reported
> to the user.
> +        # value is the unicode number value of this character in hex.
> +        speech.speak(_("Character %s") \
> +                     % self.getUnicodeValueRepresentation(character))

When writing it I thought it would be easier to add more information like character name from unicodedata module but even with that your aproach is better to translators anyway. Do you want me to submit another patch or I change it when commiting(if you will commit to  master :))?it

Comment 16 Willie Walker 2010-01-20 14:24:29 UTC

Comment on attachment 151784 [details] [review]
Patch to implement reviewing of unicode character information when pessing three times the numpad 2 or numpad down keys.

Thanks Rui!  I tested/committed a slightly modified version to git master.  I committed your original patch and made changes from there since I wanted to make sure you got credit in the git logs.

Also sent messages to gnome-i18n and gnome-doc per the string change announcement rules.

Comment 17 Joanmarie Diggs (IRC: joanie) 2010-07-05 01:58:14 UTC

Planning spam. Sorry!