Bug 144322 – Search doesn't handle non-ascii chars correctly

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 144322 - Search doesn't handle non-ascii chars correctly


Summary:	Search doesn't handle non-ascii chars correctly


Status:	RESOLVED WONTFIX

Product:	gtranslator
Classification:	Other
Component:	Generally bug'd
Version:	HEAD
Hardware:	Other Linux

Importance:	Normal major
Target Milestone:	1.1.8
Assigned To:	Juan José Sánchez Penas
QA Contact:	Ross Golder

URL:
Whiteboard:

Depends on:	gregex
Blocks:

Reported:	2004-06-14 11:33 UTC by Jordi Mallach
Modified:	2009-02-14 13:43 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Jordi Mallach 2004-06-14 11:33:29 UTC

From the Debian BTS, a bug that probably has a lot of impact in the program
usability: http://bugs.debian.org/254339

Gtranslator's search function doesn't work well with non-ASCII text.
When non-ASCII letters are present in a message in which the search
pattern is found, gtranslator highlights the wrong passage of text.  For
example, when searching for "des", it would highlight "es " in the
following message:

Ein paar mögliche Befehle (...) des Bildschirms
                                 ^^^

In this message, when search for naho, it highlights "o&#345;e.".

Zobrazit vybrané p&#345;íkazy naho&#345;e...
                            ^^^^

In short, the selection shifts to the right for each non-ASCII letter
in the message.  Looks like a classical bytes vs. characters problem to
me.

Comment 1 Ross Golder 2004-08-19 14:37:19 UTC

Yes, the regexp code in find.c is not yet UTF8-aware. It still thinks the world
is flat, and that chars == bytes! I'll be keeping an eye on the following bug in
anticipation of a decent UTF8-aware regexp function to use.

http://bugzilla.gnome.org/show_bug.cgi?id=50075

Comment 2 Morten Welinder 2004-08-19 14:45:56 UTC

While having the UTF-8 regexp engine would be nice, something else is likely
also wrong here.  It sounds like search should work absolutely just fine as
long as ASCII is being searched for.

I.e., the bytes vs. chars is to be fixed independently of the regexp problem.

If you need UTF-8 now, grab Gnumeric's.

Comment 3 Baris Cicek 2008-09-06 04:20:18 UTC

Already fixed? I don't see any problem searching with non-ascii (Turkish) characters.

Comment 4 Pablo Sanxiao 2009-02-14 13:43:51 UTC

I think this didn't work in 1.1.x series cause gettext was not being used. This is working fine in trunk version.