Bug 54166 – glib should have non-locale-sensitive islower/isupper/tolower/toupper/strcasecmp/strncasecmp

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 54166 - glib should have non-locale-sensitive islower/isupper/tolower/toupper/strcasecmp/strncasecmp


Summary:	glib should have non-locale-sensitive islower/isupper/tolower/toupper/strcase...


Status:	RESOLVED FIXED

Product:	glib
Classification:	Platform
Component:	general
Version:	1.3.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:	55833 55837

Reported:	2001-05-04 17:17 UTC by Darin Adler
Modified:	2011-02-18 15:47 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
first cut at a patch to do this (5.81 KB, patch) 2001-05-04 17:19 UTC, Darin Adler	none	Details \| Review
patch that uses ascii prefix and deprecates locale-specific versions (12.18 KB, patch) 2001-06-06 16:12 UTC, Darin Adler	none	Details \| Review
improved patch with documentation for all new functions (20.22 KB, patch) 2001-06-06 17:33 UTC, Darin Adler	none	Details \| Review
patch with fixed documentation for strncasecmp (20.25 KB, patch) 2001-06-06 18:02 UTC, Darin Adler	none	Details \| Review

Description Darin Adler 2001-05-04 17:17:32 UTC

It's almost never useful to use the locale-specific versions of these
functions. But the non-locale-specific ones that handle only ASCII "A-Z"
and "a-z" are often useful.

I've taken a quick look through sources on GNOME CVS and find that nearly
all the users of these calls would be better off with a non-locale-specific
version.

Comment 1 Darin Adler 2001-05-04 17:19:09 UTC

Created attachment 518 [details] [review]
first cut at a patch to do this

Comment 2 Darin Adler 2001-05-04 17:22:55 UTC

Additional things that I could do to the patch:

    1) add documentation -- I couldn't find a good example inside glib
to copy, or I would have included documentation with the patch.
    2) make the char functions inline
    3) optimize the char functions in other ways
    4) preserve the (in my opinion useless) locale-specificity of
g_strcasecmp and g_strncasecmp and come up with new names for the
non-locale-specific functions

Note that the functions take gchar instead of taking int the way the
standard islower/isupper/tolower/toupper functions do. If there's some
reason this is a bad choice, I can change it back.

Comment 3 Havoc Pennington 2001-06-04 21:30:40 UTC

Some action would probably be good here. Thoughts:
 - any use of islower(), etc. that assumes locale-sensitive 
   is broken, since it won't work in multibyte locales
 - ergo islower() etc. are only useful if ASCII-only
 - however having functions with the same name as the C 
   version that are a bit different is strange, though
   perhaps not unprecedented in GLib
 - should g_strdown() g_strup() be ASCII-only or UTF-8-only?
 - is it OK to incompatibly change g_strdown(), etc.?
 - how about g_ascii_* for names?

Just some random thoughts, this needs thinking through. Some use-cases
would be helpful, taking into account which are covered by g_unicode_*
and which still are not.

Comment 4 Owen Taylor 2001-06-05 19:47:27 UTC

Definitely need to be g_ascii_*

I think I'd deprecate g_strcasecmp, g_strdown, g_strup instead
of changing them.

If they don't have any encoding specified, then they need to 
handle Unicode, but I think changing existing functions to
act on UTF-8 is dubious, so I'd rather introduce new
g_utf8_strcasecmp, etc. (Jon Trowbridge was looking at this.)

Comment 5 Darin Adler 2001-06-06 16:12:23 UTC

Created attachment 613 [details] [review]
patch that uses ascii prefix and deprecates locale-specific versions

Comment 6 Darin Adler 2001-06-06 16:13:28 UTC

Also included in my patch are two places that Owen missed when he
added isupper guards to tolower calls (and vice versa).

Comment 7 Darin Adler 2001-06-06 16:26:34 UTC

Looking more closely at it, it seems that instead of using
g_ascii_strdown (or g_strdown for that matter),
g_date_prepare_to_parse should instead call g_locale_to_utf8 and then
g_utf8_strdown. And g_date_fill_parse_tokens should use g_utf8_strdown
instead of g_ascii_strdown and use g_utf8_strstr instead of strstr.

But we don't have g_utf8_strdown or g_utf8_strstr yet. Maybe Jon
Trowbridge wants to work on those. Maybe we need some new bug reports
about that part?

Comment 8 Darin Adler 2001-06-06 17:33:56 UTC

Created attachment 615 [details] [review]
improved patch with documentation for all new functions

Comment 9 Darin Adler 2001-06-06 18:02:07 UTC

Created attachment 618 [details] [review]
patch with fixed documentation for strncasecmp

Comment 10 Darin Adler 2001-06-06 18:25:56 UTC

I wrote incorrect documentation for g_ascii_strncasecmp, parroting
what the man pages for strncmp and strncasecmp say. Then I realized
that the man page is wrong and instead documented how it really works.
After looking at the C standard I'm still not sure whether the count
"n" is supposed to limit the number of characters in "s1" compared (as
the man page says) or limit the number of characters in either stirng
compared. But all the implementations say both, so I made the comment
say both.

For example, both the man page and standard seem to say that strncmp
("X", "XY", 1) should return -1, but we all know that it returns 0 in
real life.

Anyway, I fixed the documentation to match reality in the 4th cut at
the patch that I attached earlier.

Comment 11 Owen Taylor 2001-06-24 17:00:16 UTC

Hmm, two thoughts about making g_ascii_* compatible with
Unicode versions:

 - I think g_ascii_islower(), etc, probably need to 
   take a gunichar. Yes, that reintroduces the typical
   pitfall of:

    g_ascii_islower(((guchar)c[i]))

   But I think that's better than 

    c < 255 ? g_ascii_islower (c) : FALSE;

   Since asking whether a unicode character is a lowercase
   ascii is a reasonable operation too. One possibility
   would be too take int and interpret -128:-1 as being
   high ascii, but I think that's probably just too gross.

 - I suspect that g_ascii_strup(), g_ascii_strdown()
   should dup the string, since corresponding g_utf8_*
   functions should do so.
 
And, yes:

 - The date functions indeed need to be changed to 
   covert from locale / unicode normalize and/or casefold 
   rather than g_ascii_strup(), g_ascii_strdown().

   I'm not sure that the change to g_ascii_* is an
   improvement - it is broken either way.

Comment 12 Darin Adler 2001-06-27 23:38:01 UTC

Changing g_ascii_islower to take a gunichar? Seems like a poor idea.
While there might be some use for g_unichar_is_ascii_lowercase, I
don't imagine there are going to be a significant number of places
where we'd need it.

But there is ASCII text in various places, stored in gchar and gchar*
I don't think we should let the existence of gunichar prevent us from
making these useful gchar functions. Making them less convenient for
the callers who really need them just so they can be used on gunichar,
which is rarely going to be right anyway, seems wrong to me.

But I don't think it's worth debating much. Either way, getting this
in soon (not at the last possible moment) so we can get rid of the
mistaken locale dependencies in the various libraries for GNOME 2
seems the priority.

Changing strup/strdown to dup the string? Seems OK to me, but
unnecessary. If it was my decision, I would choose to dup the string,
but I prefer that kind of interface anyway (as you can tell if you
look at eel).

I think the change to the ASCII versions is a slight improvement even
for the date functions, but it's far less important than the rest of
the changes.

Comment 13 Owen Taylor 2001-06-30 16:55:18 UTC

Applied, with the strdup'ing change, but not the gunichar
change.

Sat Jun 30 12:49:26 2001  Owen Taylor  <otaylor@redhat.com>

        Patch from Darin Adler (#54166)
	
	* glib/gstrfuncs.[ch]: Add ascii-only, locale-insensitive
	g_ascii_to[lower/upper], g_ascii_str[down/up],
	g_ascii_is[upper/lower] and deprecate the locale-affected
	versions which break for UTF-8, etc. Make
	g_ascii_strup/strdown duplicating,
	not in-place for consistency with UTF-8 functions.

	* glib/gstring.[ch]: Add ascii-only, locale-insensitive
	g_string_ascii_[down/up], and deprecate the locale-affected
	versions which break for UTF-8, etc.

	* glib/gutils.c glib/gwin32.c test/testglib.c: Use
	the g_ascii_* functions where appropriate.