GNOME Bugzilla – Bug 321545
xmlParseDocument crashes on deeply-nested xml
Last modified: 2006-09-19 12:45:15 UTC
Please describe the problem: On deeply-nested xml (say a 50KB file of nothing but "<p><p><p>..."), xmlParseDocument overflows the stack and crashes due to recursive calls to xmlParseElement and xmlParseContent. The xmlParserMaxDepth limit is not enforced on this code path. Steps to reproduce: #include <stdio.h> #include <string.h> #include <sys/resource.h> #include <unistd.h> #include "libxml/parser.h" #include "libxml/parserInternals.h" static void SetStackLimit(int limit) { struct rlimit stack_rlimit; getrlimit(RLIMIT_STACK, &stack_rlimit); if (stack_rlimit.rlim_cur > limit) { printf("limiting stack size to %d\n", limit); stack_rlimit.rlim_cur = limit; setrlimit(RLIMIT_STACK, &stack_rlimit); } else { printf("current stack limited to %d bytes, which is smaller than %d\n", (int)stack_rlimit.rlim_cur, limit); } } const char* CreateInputData(int depth) { char *result = malloc(depth * 3 + 1); int i; for (i = 0; i < depth * 3; i+=3) { result[i] = '<'; result[i+1] = 'p'; result[i+2] = '>'; } result[depth * 3] = '\0'; return result; } void Parse(const char *input_data) { xmlParserCtxt *ctxt = xmlCreateMemoryParserCtxt( input_data, strlen(input_data)); xmlParseDocument(ctxt); } int main() { SetStackLimit(2048 * 1024); const char *input_data = CreateInputData(16000); Parse(input_data); return 0; } Actual results: The test program segfaults (on linux/x86). Expected results: The test program should complete successfully. Does this happen every time? Yes. Other information: This patch fixes the problem, although I don't know if it's the best solution: --- libxml2-2.6.22.orig/parser.c 2005-11-15 12:06:15.000000000 -0800 +++ libxml2-2.6.22/parser.c 2005-11-15 12:10:32.000000000 -0800 @@ -8414,6 +8414,14 @@ xmlNodePtr ret; int nsNr = ctxt->nsNr; + if ((unsigned int) ctxt->nameNr > xmlParserMaxDepth) { + xmlFatalErrMsgInt(ctxt, XML_ERR_INTERNAL_ERROR, + "Excessive depth in document: change xmlParserMaxDepth = %d\n", + xmlParserMaxDepth); + ctxt->instate = XML_PARSER_EOF; + return; + } + /* Capture start position */ if (ctxt->record_info) { node_info.begin_pos = ctxt->input->consumed +
> Expected results: > The test program should complete successfully. By "successfully", of course I mean "run to completion without dying" - it should still fail to parse this file when it hits the depth limit.
The patch I proposed for parser.c doesn't work - for some inputs the parser will go into an infinite loop (minimal example: thousands of "<p>"s followed by "᾿")
I'm a bit busy ATM, this looks like a potential serious problem though. I will try to get this fixed during at the end of the month. I hope it can wait till then. Daniel
Ping. Any updates on this? I have a workaround that has been working for me, but it would be nice to have an official fix. My workaround is a small modification to the patch I proposed last time (diff is to 2.6.26 version of parser.c). Note that I'm not sure that this patch is completely correct, only that it's been working for me. --- parser.c.orig 2006-09-12 18:32:47.000000000 -0700 +++ parser.c 2006-09-12 18:26:06.000000000 -0700 @@ -8349,7 +8349,8 @@ xmlParseContent(xmlParserCtxtPtr ctxt) { GROW; while ((RAW != 0) && - ((RAW != '<') || (NXT(1) != '/'))) { + ((RAW != '<') || (NXT(1) != '/')) && + (ctxt->instate != XML_PARSER_EOF)) { const xmlChar *test = CUR_PTR; unsigned int cons = ctxt->input->consumed; const xmlChar *cur = ctxt->input->cur; @@ -8442,6 +8443,14 @@ xmlNodePtr ret; int nsNr = ctxt->nsNr; + if ((unsigned int) ctxt->nameNr > xmlParserMaxDepth) { + xmlFatalErrMsgInt(ctxt, XML_ERR_INTERNAL_ERROR, + "Excessive depth in document: change xmlParserMaxDepth = %d\n", + xmlParserMaxDepth); + ctxt->instate = XML_PARSER_EOF; + return; + } + /* Capture start position */ if (ctxt->record_info) { node_info.begin_pos = ctxt->input->consumed + Thanks, -Ben
I think that got fixed in some ways, or I'm unable to reproduce with the current CVS head: (gdb) r Starting program: /u/veillard/XML/tst Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0xd41000 [Thread debugging using libthread_db enabled] [New Thread -1209034544 (LWP 24537)] limiting stack size to 2097152 Entity: line 1: parser error : Excessive depth in document: change xmlParserMaxDepth = 1024 <p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p ^ Program exited normally. (gdb) I don't see a segfault myself, hum ... I also tried to compile and link against the installed library on Fedora Core (Linux/i386) without seing the problem. Now that said, the patch looks just sane to me and it's probably better to be safe than sorry and make the check twice. So I applied the patch (manually) to CVS head, ran verious regression tests and since it looks fine, commited it. So fixed, thanks a lot ! Daniel