After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 59388 - g_ascii_is_space etc.
g_ascii_is_space etc.
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: general
1.3.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on:
Blocks: 59387
 
 
Reported: 2001-08-22 19:23 UTC by Alexander Larsson
Modified: 2011-02-18 15:47 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Perl program for producing table of types (1.29 KB, text/plain)
2001-08-24 05:37 UTC, Owen Taylor
  Details
patch to add these calls and fix g_unichar anomaly too (19.39 KB, patch)
2001-08-24 23:21 UTC, Darin Adler
none Details | Review
previous patch accidentally moved a comment away from the includes section (19.34 KB, patch)
2001-08-24 23:24 UTC, Darin Adler
none Details | Review
Same patch, now using macros (27.04 KB, patch)
2001-08-25 21:39 UTC, Alexander Larsson
none Details | Review

Description Alexander Larsson 2001-08-22 19:23:25 UTC
When implementing some stuff i used isascii(). Owen told me to use
g_ascii_is_space(), which doesn't exist.

Do we want to add this function? What about the other g_ascii_is_XXX that
aren't currently implemented.

Some discussion here:
http://lists.gnome.org/archives/gtk-devel-list/2001-August/msg00207.html
Comment 1 Owen Taylor 2001-08-23 23:58:35 UTC
I think we should go ahead and add all of them. It isn't
even obvious to me what ones of these are local dependent
and asking programmers with little experience in i18n
to guess seems like asking too much. (and also because we
have (char c) for the g_ascii_* functions instead of the
sign-cast-bug-inviting signature for the standard functions.)
The full set of missing functions are:

int   iscntrl(int);
int   isdigit(int);
int   isgraph(int);
int   isprint(int);
int   ispunct(int);
int   isspace(int);
int   isxdigit(int);

I would suggest making these and also the ones we already have
macros indexing into a constant array of flag fields, since
performance does matter here. But care should be taken with sign 
problems - you probably want to do something like 
array[((unsigned int)(c)) & 0xff] to get the same effect as
the current isascii (char c).


Comment 2 Darin Adler 2001-08-24 04:32:29 UTC
Alex, let me know if you want me to write the additional functions,
or if you'd like to do them yourself.
Comment 3 Owen Taylor 2001-08-24 05:37:29 UTC
Created attachment 948 [details]
Perl program for producing table of types
Comment 4 Owen Taylor 2001-08-24 05:40:23 UTC
I've attached a perl program that may be useful if implementing
these functions as macros as indicated above - it produces
a 512-byte table that can be used as above for macros without
multiple evaluation. I've checked over the table some and
believe it is correct, but it might be worth uncommenting
the commented section and verifying.
Comment 5 Owen Taylor 2001-08-24 15:26:20 UTC
The program above treats the vertical tab character (\x0b) as 
isspace/isprint, to correspond to the handling for the C
locale on Linux. On second thought, I believe this is wrong,
since the vertical tab is univerally not used, and people
won't expect it to get passed by these functions and won't
be able to handle it.

Also, \r\n\t\f\v are treated as g_unichar_isspace() but
not g_unichar_isprint(), and the script above treats
isspace as a proper subset of isprint. I think we need
to as a testcase assert that 
g_ascii_isXXX(c) <=> 0 <= c < 128 && g_unicode_isXXXX(c)
Comment 6 Darin Adler 2001-08-24 15:28:34 UTC
I was thinking that the test code should run through all the
cases and check that the results match the "C" locale results
from the is* calls, with explicit exceptions for the cases
we want to handle differently, and also run the corresponding
g_unicode_is* calls.
Comment 7 Darin Adler 2001-08-24 21:59:14 UTC
I wrote the test code already and I'm working on a patch.
Comment 8 Darin Adler 2001-08-24 22:58:19 UTC
I am now testing the code.

There was just a misunderstanding about isprint. It's supposed
to include the space character, but not other white-space characters.

I fixed the script to match the C spec. on that one. I also left
the vertical tab character in there. While I agree that it's not
so useful to call it a space character, I also don't think it makes
sense to change g_ascii_isspace to not include it if g_unichar_isspace
and isspace ("C" locale) both include it.

If you want me to change it, that's OK.

I also had to add 0x7F to iscntrl.

Anyway, all the functions now match -- testing g_ascii_isXXX vs.
isXXX and g_unichar_isXXX except for one case: g_unichar_is_punct.

g_unichar_is_punct returns FALSE for:

    DOLLAR SIGN
    PLUS SIGN
    LESS-THAN SIGN
    EQUALS SIGN
    GREATER-THAN SIGN
    SPACING CIRCUMFLEX
    SPACING GRAVE
    VERTICAL BAR
    TILDE

Is this a bug in g_unichar_is_punct?
Comment 9 Darin Adler 2001-08-24 23:05:23 UTC
On further inspection, I think that g_unichar_ispunct should include
symbols, not just punctuation.
Comment 10 Darin Adler 2001-08-24 23:21:41 UTC
Created attachment 956 [details] [review]
patch to add these calls and fix g_unichar anomaly too
Comment 11 Darin Adler 2001-08-24 23:24:17 UTC
Created attachment 957 [details] [review]
previous patch accidentally moved a comment away from the includes section
Comment 12 Owen Taylor 2001-08-25 04:46:19 UTC
Looks in general good. I'm a little uncertain about including
the checks against the C library in the test suite because the
C standard is a little vague about the exact required 
behavior for the C locale ... I think it _probably_ can
be taken as an exact specification, but am a little sure around
the edges for such things as whether DEL is a control character
or not.

I'd actually prefer to take the "what works best for the user"
approach rather than the "exactly the same as the standard C
functions". In particular, I'll stick to my guns and say that
\v should not count as a space.

And I think it _is_ worth implementing the is* functions as
macros. That's the main reason I suggested the table ...
I doubt that as functions, the table significantly improves
speed, but it allows writing macros easily. I don't think
there are any real downsides to macros except for possibly
more confusing error reporting for the user. (The C library
functions are already macros, however.)

In terms of docs, the right thing to do is to add the 
prototypes to docs/refernences/glib/glib-overrides.txt so
they get documented as function not macros. I think you
can actually put the doc comments into the header files
along with the macros now (though I haven't tried that
in this circumstance), or they can go in tmpl/.

I agree with the change to g_unichar_ispunct(), though it
isa little odd for it to be returning TRUE for things
which aren't punctuation in unicode terms, consistency
with what the C library ispunct() means is probably
more important. [ who uses ispunct() anyways? In fact,
who uses isgraph(), isprint()?.. in fact, most uses
of isalnum, etc, are also broken ... you should be using
the pango logical atributes functionality to find 
text boundaries... but anyways... ]

Comment 13 Darin Adler 2001-08-25 17:35:53 UTC
I already "bent to your will" on the '\v' thing, as you'll see
in my patch. I changed both g_ascii_isspace and g_unichar_isspace.

The C library functions are pairs of macros and matching functions.

Making the GLib functions be macros seems overkill to me. You're
welcome to do it, but I won't do that part myself. I understand that
very-high-efficiency versions of these functions can be help
performance, but I guess that sort of thing is not really my cup of
tea.

I agree that many of the <ctype.h> functions, like ispunct, are
pretty useless, and it's almost silly for us to be making the ASCII
and Unicode versions of them.
Comment 14 Alexander Larsson 2001-08-25 21:39:49 UTC
Created attachment 969 [details] [review]
Same patch, now using macros
Comment 15 Alexander Larsson 2001-08-25 21:42:29 UTC
I attached a new patch that implements the functions as macros.
I moved the docs to tmpl/

I had to change the table for vertical tab though, now it is !space
and cntrl.
Comment 16 Darin Adler 2001-08-25 21:45:21 UTC
I think you need to have parentheses around the c in the macros.
Here's how I would have done one:

#define g_ascii_isalnum(c) \
  ((g_ascii_table[(guchar) (c)] & G_ASCII_ALNUM) != 0)
Comment 17 Alexander Larsson 2001-08-25 22:28:14 UTC
Checked in.