GNOME Bugzilla – Bug 559501
SIGSEGV in xmlNanoHTTPMethodRedir()
Last modified: 2009-08-24 12:19:08 UTC
Steps to reproduce: Code triggering the crash: while (! hostess_stop) { memset(online, 0, sizeof(online)); count = 0; /* parse the feed */ if (! (feed = xmlIOHTTPOpen(url)) ) { debug("HOSTESS: could not download XML feed.\n"); goto _hostess_continue; } r = xmlReaderForIO(xmlIOHTTPRead, xmlIOHTTPClose, feed, "", NULL, 0); if (! r) { debug("HOSTESS: could not parse XML feed.\n"); xmlIOHTTPClose(feed); goto _hostess_continue; } pthread_mutex_lock(& hostess_lock); while (xmlTextReaderRead(r) == 1) { (various parsing) } xmlTextReaderClose(r); xmlFreeTextReader(r); (additional stuff) pthread_mutex_unlock(& hostess_lock); _hostess_continue: /* sleep until next refresh */ if (! hostess_stop) sleep(60); } Stack trace: Stack trace: Valgrind output: ==11611== Syscall param select(writefds) points to uninitialised byte(s) ==11611== at 0x458DCD7: select (in /lib/tls/libc-2.3.6.so) ==11611== by 0x638F463: (within /usr/lib/libxml2.so.2.6.27) ==11611== by 0x639061B: xmlNanoHTTPMethodRedir (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B59: xmlNanoHTTPMethod (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B99: xmlNanoHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6353D06: xmlIOHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6A38FED: ??? ==11611== by 0x46040BC: start_thread (in /lib/tls/libpthread-2.3.6.so) ==11611== by 0x459501D: clone (in /lib/tls/libc-2.3.6.so) ==11611== Address 0x72A9DF4 is on thread 2's stack ==11611== ==11611== Jump to the invalid address stated on the next line ==11611== at 0x0: ??? ==11611== by 0x639061B: xmlNanoHTTPMethodRedir (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B59: xmlNanoHTTPMethod (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B99: xmlNanoHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6353D06: xmlIOHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6A38FED: ??? ==11611== by 0x46040BC: start_thread (in /lib/tls/libpthread-2.3.6.so) ==11611== by 0x459501D: clone (in /lib/tls/libc-2.3.6.so) ==11611== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==11611== ==11611== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==11611== Bad permissions for mapped region at address 0x0 ==11611== at 0x0: ??? ==11611== by 0x639061B: xmlNanoHTTPMethodRedir (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B59: xmlNanoHTTPMethod (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6390B99: xmlNanoHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6353D06: xmlIOHTTPOpen (in /usr/lib/libxml2.so.2.6.27) ==11611== by 0x6A38FED: ??? ==11611== by 0x46040BC: start_thread (in /lib/tls/libpthread-2.3.6.so) ==11611== by 0x459501D: clone (in /lib/tls/libc-2.3.6.so) Other information: libxml2 2.6.27.dfsg-2 etch32 Seems like a NULL dereference, it does not occur systematically, seems the xml feed URL going down helps triggering the crash.
Stacktrace obtained with the coredump generated by valgrind (note: some URL details like the domain name and the passed IP parameter have been edited):
+ Trace 209307
Actually I think this bug results from a memory corruption due to linking with a software using a modified FD_SETSIZE. My software bumps FD_SETSIZE to 32768, and the xmlIOHTTPOpen() starts to fail on load, because it uses select() and thus can not cope with socket descriptors above the system FD_SETSIZE. The crash occurs usually after a few of such failures (error : Operation in progress). You can reproduce easily by using http://cvs.apache.org/~jorton/bumpfd.c to artificially increase the fd numbers. Since a single socket is polled, may I suggest using the FIONREAD ioctl to avoid this issue altogether? It is quite portable (probably not on BeOS or VMS though).
http://securityvulns.com/Hdocument669.html I think this supports the above hypothesis...
I recompiled libxml2 using this hack: #ifdef __linux__ #include <bits/types.h> #undef __FD_SETSIZE #define __FD_SETSIZE 32767 #endif at the beginning of nanohttp.c, and the crash does not happen anymore. This confirms the problem is due to a stack corruption triggered by using select(2) on descriptor numbers > default FD_SETSIZE. As a conservative solution, I suggest to check for descriptors > to the default FD_SETSIZE, and avoid passing the offending sockets to select(2). A probably better fix would be to get rid of select(2) altogether (i.e. using poll(2) or ioctl(FIONREAD) if available). Currently, using nanoHTTP is *dangerous* in any application having to handle more than 1024 simultaneous connections.
Hi, I just wanted to let you know that I'm now using a hand patched libxml2 using poll(2) instead of select(2) and that everything is working fine. It would be great however if this issue was patched upstream.
I discover this now, it's late but if you still have the patch to use poll instead of select, please send to the list or here, thanks, Daniel
Created attachment 140375 [details] [review] patch
Sure, I didn't have the original patch but I rewrote something nicer. If poll() is not available, nanoHTTP will fallback to select but check the socket descriptor. Please review and tell me if it's OK.
Thanks ! That looks fine just out of reviewing the patch, but apparently it just doesn't work :-\ paphio:~/XML -> ./xmllint http://xmlsoft.org/ error : Operation in progress warning: failed to load external entity "http://xmlsoft.org/" paphio:~/XML -> Do you think you will have a chance to debug this ? Seems on EAGAIN or something we need to loop on the operation instead of failing, but I didn't try to gdb it ... If you can't I will try Daniel
Created attachment 140481 [details] [review] fixed patch.....
Ah, sorry, I did a stupid typo while rewriting the patch, wrote POLLERR instead of POLLOUT in a condition. This new patch must work fine.
Please tell me once this patch gets accepted upstream or whether I have to change some other things.
It looks fine now, of course that code is a bad mix of conditional code due to portability issues, so I try to clean it up, but it looks fine and works, so applied and commited to git, thanks a lot ! Daniel P.S.: nanoftp, while less used has the same kind of problems you may want to fix them too if you have the courage :-)
Created attachment 141494 [details] [review] nanoFTP fix Same fix for nanoFTP
Hi, I fixed nanoFTP in the same way, you can test using this command: ./xmllint ftp://ftp.pramberger.at/systems/linux/contrib/rhel5/i386/repodata/repomd.xml I agree it's quite ad-hoc fixes, the best solution would probably be to add a private _xmlWaitOnSocket() function or the like to avoid code duplication and the numerous #ifdef. I'll try to refactor both nanoFTP and HTTP that way when I have the time.
*** Bug 554243 has been marked as a duplicate of this bug. ***