GNOME Bugzilla – Bug 382852
Win32: Binary package crashes instantly with RegEx-Gnu
Last modified: 2006-12-17 14:54:05 UTC
Please see the following, really short demo. Compile it against ftp://ftp.gnome.org/pub/gnome/binaries/win32/gtkhtml/3.12/gtkhtml-{,dev-}3.12.0.zip. I click on the button, wait a bit, a warning (see bug 378158) appears and the application stops. It just crashes without printing anything else. But even worse, the behavior in debuggers like gdb or WinDBG is quite different, because the app seems to work perfectly there. And I was unable to reproduce this bug with a self-compiled GtkHTML installation (3.12.0 and 3.13.2). This bug renders GtkHTML pretty unusable for GnuCash by disabling the whole report machinery. Note: I have checked this against two setups: (1) All dependencies come from binary packages on ftp.gnome.org (2) All dependencies have been compiled with debugging symbols and in the most recent version.
Created attachment 77773 [details] Demo
tor, are you still looking into evo-related windows issue? if not, just leave me a note. thanks. :-)
I can't reproduce any crash with that sample program when using the latest binaries. I didn't bother trying combinations of older versions. Could you tell me exactly which versions of the dependent libraries you use?
Well, pretty standard. I hope I did not forget something. Meanwhile I retested from scratch with "minimal" dependencies. * Microsoft Windows XP Professional Version 2002, Service Pack 2 * MSYS 1.0.10 * MinGW 5.0.3 * ftp://ftp.gnome.org/pub/gnome/platform/2.16/2.16.0/win32/dependencies/ - libiconv-1.9.1.bin.woe32 - cairo{,-dev}-1.2.4 - gettext{,-dev}-0.14.5 - libpng-1.2.8 - pkg-config-0.20 - popt{,-dev}-1.10.2-tml-20050828 - zlib-1.2.3 * http://www.zlatkovic.com/pub/libxml - libxml2-2.6.26 * ftp://ftp.gnome.org/pub/gnome/binaries/win32 - atk{,-dev}-1.12.3 - gconf{,-dev}-2.14.0 - gtk+{,-dev}-2.10.6 - glib{,-dev}-2.12.4 - gnome-vfs{,-dev}-2.14.2 - gtkhtml{,-dev}-3.12.0 - libart_lgpl{,-dev}-2.3.17 - libbonobo{,-dev}-2.16.0 - libbonoboui{,-dev}-2.16.0 - libglade{,-dev}-2.6.0 - libgnome-2.16.0-1 - libgnome-dev-2.16.0 - libgnomecanvas{,-dev}-2.14.0 - libgnomeprint{,-dev}-2.12.1 - libgnomeprintui{,-dev}-2.12.1 - libgnomeui{,-dev}-2.16.0 - orbit2{,-dev}-2.14.2 - pango{,-dev}-1.14.5 - libbonobo
As far as I can see that is the same versions I tried with. (Well, you use a slightly newer libxml2, but surely that's completely irrelevant for this case.) I really have no idea what causes the crash for you... You said you can reproduce the crash also if you recompile gtkhtml (with debugging symbols)? But then when you run under the debugger, no crash? Could you try installing drmingw and let it do a post-mortem stack trace, to see where the crash happens? (Drmingw is useful and reliable only when debugging symbols are present.)
No, I cannot reproduce the bug with debugging symbols, but Dr. Mingw helped nonetheless. Here is an excerpt of its output: gtkhtml.exe caused an Access Violation at location 7c9211e0 in module ntdll.dll Reading from location 00000006. Registers: eax=00ad0300 ebx=00ad0000 ecx=0128edf8 edx=00ad0300 esi=0128edf0 edi=00000002 eip=7c9211e0 esp=0022ec3c ebp=0022ee5c iopl=0 nv up ei pl nz na po nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00200206 Call stack: 7C9211E0 ntdll.dll:7C9211E0 _RtlAllocateHeap@12 77BFC3C9 MSVCRT.dll:77BFC3C9 __heap_alloc 77BFC3E7 MSVCRT.dll:77BFC3E7 __nh_malloc 77BFC42E MSVCRT.dll:77BFC42E _malloc 661C91B0 regex.dll:661C91B0 regcomp 00619DDD libgtkhtml-3.8-15.dll:00619DDD html_engine_init_magic_links 005F1C81 libgtkhtml-3.8-15.dll:005F1C81 html_engine_get_type 6275EAE6 libgobject-2.0-0.dll:6275EAE6 type_class_init_Wm gtype.c:1874 static void type_class_init_Wm( TypeNode * node = &(indirect), GTypeClass * pclass = &{ GType g_type = 80 } ) ... node->data->class.class_init (class, (gpointer) node->data->class.class_data); > G_WRITE_LOCK (&type_rw_lock); node->data->class.init_state = IFACE_INIT; ... === So finally I tried to replace RegEx-Gnu by RegEx-Spencer and voilà, the crash disappears. I will soon repeat my test with a full-blown GnuCash setup. If that works, what to do then?
We discussed this on IRC and found out what is happening. Here is a "regex.README" I will put up on ftp.gnome.org: My Win32 builds of GNOME software that need a POSIX regular expression (regex) library are built against Henry Spencer's regex library. More specifically, a Win32 build of it as distributed by the gnuwin32 project (see http://gnuwin32.sourceforge.net/packages.html). Unfortunately there has been a fatal confusion in the name of the DLLs of the Spencer and GNU regex libraries. In the gnuwin32 version "3.8" of the Spencer regex library, the DLL was called "regex.dll". This version can be found at http://prdownloads.sourceforge.net/gnuwin32/regex-spencer-3.8-bin.zip In the current gnuwin32 version of Spencer regex, "3.8.g.3", the DLL is called "rxspencer.dll". (It presumably still offers the same API and ABI as the "3.8" regex.dll, though. But in this case the rename is a good thing. Read on.) Now comes the catch: the name "regex.dll" is also (situation in December 2006, at least) used by the gnuwin32 distribution of the GNU regex library version 0.12. (This is as such a quite old version of GNU regex, built from sources from 1993. The GNU regex sources were subsequently merged into glibc. It might nowadays be hard to build GNU regex as a standalone library. Especially hard for Win32 perhaps, knowing the attitude of the glibc maintainer...) This clash is horrendous, as these two regex libraries are not at all binary compatible, but this is hard to notice, as they provide the same POSIX regex functions. For starters, the regex_t struct type is larger in the GNU regex library than in the Spencer regex library. Also, some constants defined in the regex.h header like REG_NEWLINE and REG_NOSUB have different value in the two implementations. (Both naturally provide a regex.h file, as they should, that is not a problem as such.) My earlier Win32 builds were against gnuwin32's "3.8" builds of Spencer regex, and thus the binaries who use it link to regex.dll, which should be the Spencer regex DLL. If one instead uses the regex.dll from gnuwin32's GNU regex 0.12 package, you will hopefully get a crash, or in the worst case, silent data corruption. It is very possible and likely even that somebody who downloads my gtkhtml build, for instance, notices that it requires something called "regex.dll", and then assumes this is the gnuwin32 build of GNU regex, downloads that, notices that yes indeed, there is regex.dll. Since December 2006 I will build against gnuwin32's "3.8.g.3" version of the Spencer regex library, and my binaries thus link to rxspencer.dll. This should avoid any confusion. Note that most software assumes that a separate POSIX regex library is found as -lregex. The gnuwin32 distribution of the Spencer regex library, however, uses -lrxspencer. Thus I made a copy of librxspencer.dll.a as libregex.dll.a so that configure scripts will find it without modification. Recall that the DLL in the current version is still called rxspencer.dll. (I haven't installed the gnuwin32 port of GNU regex at all. If I would have, it would have provided a libregex.dll.a. But I don't want it at all, because of the above confusion.) So, what to learn from this: If you build and distribute Win32 binaries, don't forget this simple rule: Never, ever, use the same name for two incompatible DLLs. Note that *adding* API to a library (which is something quite normal and common to do as a library evolves) does not make it incompatible. Deleting API or otherwise changing the ABI does. Always change the name of the DLL when it grows incompatible with earlier versions. --Tor Lillqvist <tml@novell.com>, <tml@iki.fi>
A message I just sent to the gnuwin32 list...: And whoopee, it turns out that there is a dangerous flaw in the 3.8.g.3 version of the RegEx-Spencer library as distributed by gnuwin32. The obsoleted 3.8 version (which is very hard to find from the gnuwin32 site, I found it last week, but can't find it now...) works correctly, but as I said in my previous message, confusingly uses the same name for the DLL as gnuwin32's build of RegEx-GNU does. Try this simple test program against both versions of the RegEx-Spencer library. Build it in a clean mingw environment or in MSVC. #include <stdio.h> #include <sys/types.h> #include <regex.h> int main (int argc, char **argv) { const char *pattern = "[ \t]*<mailto:([^@>]+)@?([^ \n\t\r>]*)"; const char *string = " <mailto:foobar-zap-list@lists.sourceforge.net>"; regex_t re; int j, rc; regmatch_t match[3]; rc = regcomp (&re, pattern, REG_EXTENDED|REG_ICASE); for (j=0;j<3;j++) { match[j].rm_so = -1; match[j].rm_eo = -1; } rc = regexec (&re, string, 3, match, 0); printf ("Should have matched, rc=%d, match={(%ld,%ld),(%ld,%ld),(%ld,%ld)}\n", rc, match[0].rm_so, match[0].rm_eo, match[1].rm_so, match[1].rm_eo, match[2].rm_so, match[2].rm_eo); return 0; } When run against the 3.8.g.3 DLL, you will see that the match array is filled in incorrectly after the match: Should have matched, rc=0, match={(0,0),(46,0),(9,0)} If you build it with MSVC, it will even crash after printing that... When run against the RegEx-Spencer 3.8 DLL (called regex.dll, but not the same as the regex.dll in RegEx-GNU), the result is correct: Should have matched, rc=0, match={(0,46),(9,24),(25,46)} It turns out that the 3.8.g.3 version of the RegEx-Spencer library seems to have been built in an environment where off_t is long long, not long like it is in the Mircosoft C library and mingw. What the heck? This is horrible. It means that the RegEx-Spencer library thinks the regmatch_t struct is 16 bytes (two long longs), while normal mingw-compiled code thinks it is 8 bytes (two longs). It means that when calling regexec(), it will overwrite the match array, scribbling over whatever happens to be allocated after it in memory. (This presumably explains the crash when the above program has been built with MSVC, it happens to scribble over main's return address on the stack, or something like that.) To see this horror in action, add some variables before and after the "match" array in the sample code above, and print out their values before and after the regexec() call. Please, fix this as soon as possible. Either 1) Recompile RegEx-Spencer in a proper mingw environment without any mysterious add-on secret sauce that redefines off_t. (In this case, remember that you *must* also use a different name for the fixed DLL, as it is no longer ABI compatible with the old rxspencer.dll. This will also make it stand out that executables that link to the new rxspencer-1.dll (or whatever you choose to call it) aren't expected to work with the old one.) Or, 2) modify the regex.h to spell out that regoff_t is explicitly typedeffed as long long. In this case no new DLL name or "bin" package is necessary, but a new version of the "lib" package with the modified regex.h is of course needed. --tml P.S. I guess the "secret sauce" that redefines off_t as long long is the "libgw32c" library? But surely using that should not be a requirement for users of packages like RegEx-Spencer? If the intention really is that it is a requirement, then some mechanism should be introduced that causes an error if one tries to include <regex.h> in a non-libgw32c-modified environment.
Simply for reference: The mail thread can be found at: http://sourceforge.net/mailarchive/forum.php?thread_id=31206370&forum_id=2177 For old releases of RegEx-Spencer, go to http://gnuwin32.sourceforge.net/packages/regex-spencer.htm and click on "files page". This will point you to http://sourceforge.net/project/showfiles.php?group_id=23617&package_id=17185 where you should see all released packages.
OK. We have found the source of the problem, my demo does not crash any more and several users have confirmed that text reports work in GnuCash now, using RegEx-Spencer v3.8. Closing as NOTGNOME. Thanks, Tor!