GNOME Bugzilla – Bug 79488
include a good portable printf implementation
Last modified: 2011-02-18 15:55:17 UTC
I think it would be nice if glib would guarantee the availability of positional parameters in its printf-like formatting functions. IMO, these are really needed for high-quality localization. Now, as has been pointed out whenever this issue comes up, coding all of this yourself is hard. Therefore I have done a little search on available implementations. Here are pointers to some: http://daniel.haxx.se/trio/ http://daniel.haxx.se/trio/competition.html (list more) http://www.ijs.si/software/snprintf/ (lists even more) Just from glancing at licenses and completeness, trio seems a good candidate for inclusion. The license is MIT and it claims to be a complete implementation of all C99 and SUS2 features. Its not small though (135k C-source). libxml seems to be using it too. Being able to rely on C99 and SUS2 features of printf would allow a number of cleanups in glib, eg getting rid of the format string parsing. This bug is meant as a test balloon, so you can shoot it down before I investigate more time in this idea.
Of course I mean that this should be compiled in only #if !defined(HAVE_C99_VNPRINTF) ||!defined(HAVE_UNIX98_PRINTF) we should e.g. still use glibc2.2 printf.
This strikes me as an excellent idea.
Here is a first attempt. It is not very polished, complete or well-tested. It adds a --enable-trio-printf configure option which can be used to force or forbid the use of trio printf. The default is to use trio printf if either C99 vsnprintf return value semantics or Unix98 positional parameters are missing from the system printf. There is a new private header gprintf.h which maps _g_ prefixed variants of the printf functions to either the system or trio implementation. The _g_ functions are not used everywhere currently, only where we operate on an external (or translated) format string. I have not done any simplifications which would be possible since we can rely on C99 vsnprintf and the availability of all printf family members. The included trio sources are pristine trio-1.6 sources.
Created attachment 7900 [details] [review] the patch
Created attachment 7901 [details] new files
The main areas where this need further work are: 1) Decide what assumptions glib code should make on the printf implementation. Currently we have HAVE_VPRINTF (unused) HAVE_DOPRNT (unused, defined as a side effect of the vprintf test) HAVE_VSNPRINTF (used in gutils.c gmessages.c) HAVE_C99_VSNPRINTF (used in gmessages.c) HAVE_VASPRINTF (used in gstrfuncs.c) Assuming HAVE_VSNPRINTF and HAVE_C99_VSNPRINTF will allow considerable simplification in gutils.c and gmessages.c. Assuming HAVE_VASPRINTF doesn't yield much in terms of simplification and should IMO not be done, since VASPRINTF is not part of any standard (its a GNU invention, both glibc and trio have it) 2) Decide whether glib code should consistently use only _g prefixed wrappers of the printf functions or if we want to restrict this to places where we need it because we assume HAVE_[C99]_VSNPRINTF and where we deal with a format string which comes from the application or from gettext (thus may contain positional parameters). 3) Add the necessary Makefile magic to use trio on win32. 4) Add a README in trio directory
Created attachment 7936 [details] [review] simplifications in gutils.c and gmessages.c
Currently, g_strdup_printf("%s", NULL) returns "(null)" on some OSes (Linux, BSD) and crashes on others (Solaris, HP), which results in "solaris-only" bugs. It would be nice to decide that one of the two behaviors is correct and use the glib-provided printf implementation if the system printf does it the other way, so you get the same behavior everywhere.
Note that the included printf is most likely going to be considerably deficient from the native printf in the area of floating point handling (this was one comment I got about trio from someone who had worked with it before; the other comment he made was that it had lots of non-standard extensions.) So I'm not sure we should be too eager to use it. I tend to consider that places that assume that NULL for %s works should just be fixed.
Can you be more precise about the floating point problems ? The CHANGES file in trio CVS lists a number of floating point fixes in releases 1.8 - 1.0, all of which have been released after I filed this bug report. Regarding the extensions, we don't do anything to prevent the use of extensions which happen to be provided by the native libc printf implementation, so why would this be a problem for adopting trio ?
The person I was talking to about this was James Antill, the author of Vstr - http://www.and.org/vstr/. (So possible bias, when considering competing implementations, as he admitted). A concern about extensions was that they didn't follow the rules that GCC expect, so they couldn't be used with the format specifier attributes that GCC uses to check printf arguments, so probably not a concern for us; since what we'd prefer is _exactly_ C99 + positional parameters, nothing more, nothing less. I'd be more concerned with the extreme amount of crack that trio has in this area, judging from their examples they have on the front page. The concern about the double handling seemed to be that he had some idea about how hard getting the double output right was and the trio code didn't look sufficient. (Apparently, Vstr just uses a cut-and-paste of the glibc code, including multiple-precision code. I think multiple-precision handling is really needed if you want to have read/write round-trip work reliably.) So, I'm not really sure what my conclusion is here, Trio doesn't really inspire me with a lot of confidence in its accuracy or standards compliance, but on the other hand, it may not really matter much.
I asked on the trio mailing list whether it is possible to compile trio in a C99+SUS2 mode without any extensions, and got the answer to "just set the TRIO_EXTENSIONS macro to zero, which is easiest done by adding -DTRIO_EXTENSIONS=0 as a compiler flag." So we should be able to use it without the crack extensions.
Let's go ahead and proceed with this; trio should be "good enough", and we always can swap it out for something better later if such shows up, as long as we are clear about what is supported. I wrote some test cases using GNU MP for printf accuracy; they are a bit more rigid than the C standard, though GNU libc and Solaris libc should pass them. (C99 standard only requires the first DECIMAL_DIG significant digits to be correctly rounded.) Once I get them to match the C standard exactly I'll put them up somewhere. I'm sure that trio will fail them miserably, but well, at least we'll be able to quantify that. :-) One minor thing I noticed in your patch is that you need to add a DIST_SUBDIRS variable including Trio. (See the automake docs.) Also, I'd rename --enable-trio-printf to --enable-included-printf About some of the questions above: - I think we should force the use of the included printf on systems that don't have a) positional parameters b) C99 snprintf I dont' think we should require vasprintf or NULL => (null). - We should add a gprintf.h including public symbols (not macros) g_printf,g_fprintf,g_sprintf,g_snprintf and the v-variants of those 4 functions. The reason for symbols not macros is you don't want thing to break if you recompile GLib differently. This gprintf.h should _not_ be included explicitely by glib.h since it needs to pull in stdio.h. (I think this will be most generally accepted; alternative is just to accept that include <glib.h> pulls in stdio.h.) This header should be used throughout Pango/ATK/GTK+ uniformly, though doing that can wait. - GLib should have use a private header as you have now that switches between the normal stdio and the included variants and we should use it uniformly within GLib. [ This may all be a little optimistic for 2.2 at this point ]
Here is a new patch, based on trio-1.9. I didn't yet add an installed gprintf.h, but I renamed the former gprintf.h to gprintfint.h to keep it as a private header. What I did so far: - change --enable-trio-printf to --enable-included-printf - try to use this for all stdio throughout glib - simplify gmessages.c and gutils.c as above - compile trio with -DTRIO_EXTENSION=0 why do you think DIST_SUBDIRS is necessary, Owen ? Wouldn't it be ok to dist the trio subdir unconditionally ? next steps: - add public gprintf.h (must move some functions currently in gutils.h) - add trio/README - win32 compilation (I can't do this) - configure.in could be simplified a bit if we decide that using vasprintf in g_strdup_printf isn't worth the trouble, given that we can assume a working C99 vsnprintf (which lets us get the allocation right with 2 calls)
Created attachment 12417 [details] [review] patch
Created attachment 12418 [details] new files
There is some garbage 'cd g' in the configure.in patch, otherwise didn't notice any problems in a quick scan. You need to use DIST_SUBDIRS any time a subdirectory is conditionally compiled, or the subdir wont be recursed into in make dist, and so won't be in the final tarball. The point of DIST_SUBDIRS is not to exclude but to include things that wouldn't otherwise be included. I think keeping vasprintf is important; performance of g_strdup_printf() is pretty crucial, and having to parse the format string twice, format doubles twice, etc, could hurt. As far as win32 duplicate goes, I believe the standard procedure is to commit it broken and let Tor fix it up ;-(. I wouldn't be suprised if it "just works" on Win32 though, at least when using autotools. There doesn't seem to be anything in Trio that should be Unix specific. The MSC makefiles will need updating, but is also a "when someone using MSC notices" type issue generally.
I must have misread the automake manual then; I seem to have read that automake defaults to dist everything that *may* occur in SUBDIRS. Regarding the public gprintf header: I don't see why it would have to drag in stdio. After all, we already define g_snprintf and g_vsnprintf in gutils.h and it doesn't have to drag in stdio.h - that is needed only in the private header where the wrappers are #defined. I believe gprintf.h is not really necessary; it should be ok to put the remaining g...printf variants in gutils.h. Comments ?
sprintf() isn't the problem, it's fprintf ,which needs a FILE * in its prototype. It does appear that automake does now try to autogess the right value from DIST_SUBDIRS when you have a conditionally defined SUBDIRS; I'm not sure if this is new, or it just couldn't autoguess successfully what I was using before. In the end ff make dist works with trio disabled without DIST_SUBDIRS, we don't need DIST_SUBDIRS :-)
Here are gprintf.[hc]. Of course, g_snprintf and g_vsnprintf will have to be removed from gutils.[hc] when adding these. What I'm not sure about is whether gprintf.h should be added to glibinclude_HEADERS or glibsubinclude_HEADERS. I'll also attach an attempt at glib/trio/README
Created attachment 12437 [details] gprintf.h
Created attachment 12438 [details] gprintf.c
Created attachment 12439 [details] trio/README
I'll vote for glibsubinclude_HEADERS, on the theory that it isn't the header file for an entire library. We'll need a note in the docs that glib/gprintf.h should be included explicitely. We can't actually remove g_sprintf from gutils.h; that would break API compatibility. But it shouldn't hurt to have a function protoyped identically in two places. (If I'm wrong about that, we can just make gprintf.h include gutils.h.)
BTW, other than the note about gutils.h, everything looks good to to me.
Do we actually guarantee stability of individual headers ? I thought apps were supposed to include glib.h ? Anyway, I'll test if declaring the snprintf variants in two places breaks anything.
I don't think we guarantee particular header stability for GLib (though unlike GObject we don't enforce the non-inclusion of gobject.h, so I bet some people are including individual headers) But it doesn't particularly matter here ... if we don't include gprintf.h in glib.h (what I was discussing above) then moving something into gprintf.h is moving it out of glib.h.
Ugh, you are right. Anyway, I have left the snprintf declarations in gutils.h and added a copy in gprintf.h. A quick test didn't seem to have a problem with that. All committed now. Over to you, Tor.