GNOME Bugzilla – Bug 131576
The spell checker should not break words on contractions
Last modified: 2016-03-05 21:14:03 UTC
Description of Problem: gedit spell plugin marks valid words with apostrophe as misspelled Steps to reproduce the problem: 1. Run gedit 2. activate spell plugin 3. select spell language English(American) 4. turn on automatic spell checking 5. enter word "couldn't" in text area Actual Results: word underlined with red line Expected Results: non-underlined word How often does this happen? Always Additional Information: Same for other languages. For example, valid Ukrainian word "пам'ять" and many others are marked as misspelled. I think apostrophe serves as word separator in this context. Exception of this rule are words like "can't", "it's" because they're left and right parts are valid words (at the aspell point of view).
This is a duplicated of bug #97545. Evan Martin suggested a workaround in bug #97861: ============== This, at the top of gedit-automatic-spell-checker.c, should do it: static gboolean gtkspell_text_iter_forward_word_end(GtkTextIter *i) { GtkTextIter iter; /* heuristic: * if we're on an singlequote/apostrophe and * if the next letter is alphanumeric, * this is an apostrophe. */ if (!gtk_text_iter_forward_word_end(i)) return FALSE; if (gtk_text_iter_get_char(i) != '\'') return TRUE; iter = *i; if (gtk_text_iter_forward_char(&iter)) { if (g_unichar_isalpha(gtk_text_iter_get_char(&iter))) { return (gtk_text_iter_forward_word_end(i)); } } return TRUE; } static gboolean gtkspell_text_iter_backward_word_start(GtkTextIter *i) { GtkTextIter iter; if (!gtk_text_iter_backward_word_start(i)) return FALSE; iter = *i; if (gtk_text_iter_backward_char(&iter)) { if (gtk_text_iter_get_char(&iter) == '\'') { if (gtk_text_iter_backward_char(&iter)) { if (g_unichar_isalpha(gtk_text_iter_get_char(&iter))) { return (gtk_text_iter_backward_word_start(i)); } } } } return TRUE; } #define gtk_text_iter_backward_word_start gtkspell_text_iter_backward_word_start #define gtk_text_iter_forward_word_end gtkspell_text_iter_forward_word_end It is a hack, but not that bad as far as hacks go. =============== If bug #97861 will not be closed before 2.6, I will recosider applying the above hack. I'm not closing this bug as a duplicate of bug #97861 since I want to have a reminder in the gedit list of bugs. Setting severity to major since it is a pretty annoying bug. Chaging summary too.
Is anyone willing to test the Evan Martin's patch I have attached to this bug in my previous comment? I really have no time to do it.
This patch works pretty well for Ukrainian and English (at least). I think this patch should be commited, if there are no better ways to fix this bug.
Paolo, do you still want to get it in? If you still feel uncomfortable with the level of testing, maybe you could email d-d-l or gnome-love and ask people to try it?
Sorry Evan if I have added you to the CC list of this bug. In bug #97851 you proposed the workaround you have implemented in gtkspell. I'm not so convinced it really works. Probably also gtk_text_iter_inside_word, gtk_text_iter_starts_word and C. should be wrapped. What does it happen when the current text is: "don" and you paste "'t know" ? Am I on crack? Maxim: may you test it too?
You can see the problem I'm speaking of in the following way: 0. Apply the patch proposed by Evan 1. Create an empty document 2. Activate auto spell check (be sure to use english as language) 3. Write "do't know" -> gedit marks "do't" as an error 4. Add "n" before "'t", press END 5. gedit will mark "'t" as an error -> this is wrong I think the only solution is waiting for bug #97545. Probably the patch does not work with languages like italian where apostrophe is used in a different way, For example "un'altra" is a contraction for "una altra", it should be considered as two words (like in english) But "un'altro" is wrong, it should be considered as a single word. The right syntax in this case is "un altro"
I can confirmed that this problem also exists in the current version of gtkspell (2.0.5). The newest gtkspell checks words at a different time (when the cursor exits the word), but still contains the apostrophe hack. I agree Paolo, the only real solution is to fix <a href="http://bugzilla.gnome.org/show_bug.cgi?id=97545">bug 97545</a> in Pango. The apostrophe hack we're using in gtkspell really just covers one english-centric case: typing a contraction directly in a language that has apostrophe rules like english.
Update summary
*** Bug 304534 has been marked as a duplicate of this bug. ***
Mentioned in https://launchpad.net/distros/ubuntu/+source/gedit/+bug/36227 as well.
Is this still relevant or has the switch to Enchant solved it ?
It is still relevant.
Hello, Are there currently any plans to fix this, or has work really not advanced since 2004? It's a fairly visible and embarrassing bug.
Hello, this is still an issue in gedit 2.30.3, "didn't", "couldn't"
*** Bug 596486 has been marked as a duplicate of this bug. ***
*** Bug 335626 has been marked as a duplicate of this bug. ***
Still a problem in gedit 3.10.
*** Bug 621810 has been marked as a duplicate of this bug. ***
(In reply to comment #17) > Still a problem in gedit 3.10. Still is in 3.14.2 and in a day this bug reaches the 11 years open mark.
(In reply to comment #19) > this bug reaches the 11 years open mark. Well, this bug is still open because it is a valid complaint and because it makes easier from time to time to mark duplicates against it. But honestly I do not see this being fixed at the gedit level in the forseable future, unless someone shows up and puts in the work. This should really be addressed at a lower level in the stack, either in pango or in some spell-checking library
Aspell and IIRC enchant can check hyphenated words, if the dictionaries define hyphens as part of words. Maybe you mean to mark this as "won't fix" and point to the 8-year-old bug report #383706, about incorporating spell checking into GTK+?
I think GtkSpell has fixed this bug.
I tried GtkSpell 2.0.16 through Poedit 1.7.5 on Arch Linux, and didn't notice any improvement on this.
Of course you need a more recent version of GtkSpell. See the ChangeLog for the 3.0.5 and 3.0.6 versions: http://gtkspell.sourceforge.net/ChangeLog
Sorry for the confusion -- I blame package names :) Anyway, I tried again, with Evolution 3.16.1 and 3.0.7. "d'água" (a contraction of "da" and "água", meaning "of water") is considered as incorrect, even if it was explicitly included in the dictionary. On the other hand, Enchant detects it correctly as a correct word: $ echo "d'água" | enchant -a @(#) International Ispell Version 3.1.20 (but really Enchant 1.6.0) * $ echo "dd'água" | enchant -a @(#) International Ispell Version 3.1.20 (but really Enchant 1.6.0) & dd'água 5 0: d'água, D'Água, d'Água, N'Água, n'Água
*** Bug 750336 has been marked as a duplicate of this bug. ***
Re-assign to gspell.
Done. The implementation is probably not perfect, but it's a temporary solution. When the Pango bug #97545 will be fixed, it will be possible to simplify the code in gspell, and have a better implementation.