After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 321632 - htmlReadMemory broken if LIBXML_LEGACY_ENABLED not set
htmlReadMemory broken if LIBXML_LEGACY_ENABLED not set
Status: RESOLVED INCOMPLETE
Product: libxml2
Classification: Platform
Component: general
2.6.22
Other Linux
: Normal normal
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2005-11-16 17:41 UTC by Gary Coady
Modified: 2008-07-19 11:20 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description Gary Coady 2005-11-16 17:41:12 UTC
Distribution/Version: Ubuntu/5.10

Revision 1.46 of globals.c introduced a change that
inithtmlDefaultSAXHandler(&gs->htmlDefaultSAXHandler);
was only called if both LIBXML_HTML_ENABLED and LIBXML_LEGACY_ENABLED were defined.

ctxt->sax is defined by htmlReadMemory when following the following path (only
when SAX1 is compiled in):
htmlReadMemory at HTMLparser.c:5941
xmlCreateMemoryParserCtxt at parser.c:12360
xmlNewParserCtxt at parserInternals.c:1807
xmlInitParserCtxt at parserInternals.c:1553
xmlDefaultSAXHandlerInit at SAX2.c:2754
xmlSAXVersion at SAX2.c

Then the ctxt->sax pointer is initialized by xmlInitParserCtxt()

So at HTMLparser.c:5944, ctxt->sax != NULL, and has been initialized to the
values of xmlDefaultSAXHandler.

Breakpoint 1, htmlReadMemory (
    buffer=0x8dfa710 "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0
Transitional//EN\">\r\n<HTML><HEAD>\r\n<META HTTP-EQUIV=\"Content-Type\"
CONTENT=\"text/html; charset=us-ascii\">\r\n<TITLE>Message</TITLE>\r\n\r\n<META
content=\"MSHTML 6."..., size=4951, URL=0x848d7d1 "html:", encoding=0x8df5568
"utf-8",
    options=65633) at HTMLparser.c:5945
5945            memcpy(ctxt->sax, &htmlDefaultSAXHandler, sizeof(xmlSAXHandlerV1));
(gdb) print *ctxt->sax
$1 = {internalSubset = 0x821ecf0 <xmlSAX2InternalSubset>,
  isStandalone = 0x821ec60 <xmlSAX2IsStandalone>,
  hasInternalSubset = 0x821ec90 <xmlSAX2HasInternalSubset>,
  hasExternalSubset = 0x821ecc0 <xmlSAX2HasExternalSubset>,
  resolveEntity = 0x821efd0 <xmlSAX2ResolveEntity>,
  getEntity = 0x821f040 <xmlSAX2GetEntity>,
  entityDecl = 0x821f230 <xmlSAX2EntityDecl>,
  notationDecl = 0x821f5b0 <xmlSAX2NotationDecl>,
  attributeDecl = 0x821f3c0 <xmlSAX2AttributeDecl>,
  elementDecl = 0x821f520 <xmlSAX2ElementDecl>,
  unparsedEntityDecl = 0x821f660 <xmlSAX2UnparsedEntityDecl>,
  setDocumentLocator = 0x821f7b0 <xmlSAX2SetDocumentLocator>,
  startDocument = 0x821f7c0 <xmlSAX2StartDocument>,
  endDocument = 0x821f8c0 <xmlSAX2EndDocument>, startElement = 0,
  endElement = 0, reference = 0x8221120 <xmlSAX2Reference>,
  characters = 0x82211a0 <xmlSAX2Characters>,
  ignorableWhitespace = 0x82211a0 <xmlSAX2Characters>,
  processingInstruction = 0x82213f0 <xmlSAX2ProcessingInstruction>,
  comment = 0x8221520 <xmlSAX2Comment>,
  warning = 0x81d4340 <xmlParserWarning>, error = 0x81d41b0 <xmlParserError>,
  fatalError = 0x81d41b0 <xmlParserError>,
  getParameterEntity = 0x821f200 <xmlSAX2GetParameterEntity>,
  cdataBlock = 0x8221650 <xmlSAX2CDataBlock>,
  externalSubset = 0x821eda0 <xmlSAX2ExternalSubset>,
  initialized = 3740122799, _private = 0x0,
  startElementNs = 0x82209c0 <xmlSAX2StartElementNs>,
  endElementNs = 0x82210b0 <xmlSAX2EndElementNs>, serror = 0}


This is then overwritten at HTMLparser.c:5945 with the value of
htmlDefaultSAXHandler. Except htmlDefaultSAXHandler is not filled in the
per-thread data:

(gdb) print *(xmlGlobalState *) pthread_getspecific(globalkey)
$2 = {xmlParserVersion = 0x849a7a2 "20622", xmlDefaultSAXLocator = {
    getPublicId = 0x821ebd0 <xmlSAX2GetPublicId>,
    getSystemId = 0x821ebe0 <xmlSAX2GetSystemId>,
    getLineNumber = 0x821ec00 <xmlSAX2GetLineNumber>,
    getColumnNumber = 0x821ec30 <xmlSAX2GetColumnNumber>},
  xmlDefaultSAXHandler = {internalSubset = 0x821ecf0 <xmlSAX2InternalSubset>,
    isStandalone = 0x821ec60 <xmlSAX2IsStandalone>,
    hasInternalSubset = 0x821ec90 <xmlSAX2HasInternalSubset>,
    hasExternalSubset = 0x821ecc0 <xmlSAX2HasExternalSubset>,
    resolveEntity = 0x821efd0 <xmlSAX2ResolveEntity>,
    getEntity = 0x821f040 <xmlSAX2GetEntity>,
    entityDecl = 0x821f230 <xmlSAX2EntityDecl>,
    notationDecl = 0x821f5b0 <xmlSAX2NotationDecl>,
    attributeDecl = 0x821f3c0 <xmlSAX2AttributeDecl>,
    elementDecl = 0x821f520 <xmlSAX2ElementDecl>,
    unparsedEntityDecl = 0x821f660 <xmlSAX2UnparsedEntityDecl>,
    setDocumentLocator = 0x821f7b0 <xmlSAX2SetDocumentLocator>,
    startDocument = 0x821f7c0 <xmlSAX2StartDocument>,
    endDocument = 0x821f8c0 <xmlSAX2EndDocument>,
    startElement = 0x821ffb0 <xmlSAX2StartElement>,
    endElement = 0x8220710 <xmlSAX2EndElement>,
    reference = 0x8221120 <xmlSAX2Reference>,
    characters = 0x82211a0 <xmlSAX2Characters>,
    ignorableWhitespace = 0x82211a0 <xmlSAX2Characters>,
    processingInstruction = 0x82213f0 <xmlSAX2ProcessingInstruction>,
    comment = 0x8221520 <xmlSAX2Comment>,
    warning = 0x81d4340 <xmlParserWarning>,
    error = 0x81d41b0 <xmlParserError>,
    fatalError = 0x81d41b0 <xmlParserError>,
    getParameterEntity = 0x821f200 <xmlSAX2GetParameterEntity>,
    cdataBlock = 0x8221650 <xmlSAX2CDataBlock>,
    externalSubset = 0x821eda0 <xmlSAX2ExternalSubset>, initialized = 1},
  docbDefaultSAXHandler = {internalSubset = 0, isStandalone = 0,
    hasInternalSubset = 0, hasExternalSubset = 0, resolveEntity = 0,
    getEntity = 0, entityDecl = 0, notationDecl = 0, attributeDecl = 0,
    elementDecl = 0, unparsedEntityDecl = 0, setDocumentLocator = 0,
    startDocument = 0, endDocument = 0, startElement = 0, endElement = 0,
    reference = 0, characters = 0, ignorableWhitespace = 0,
    processingInstruction = 0, comment = 0, warning = 0, error = 0,
    fatalError = 0, getParameterEntity = 0, cdataBlock = 0,
    externalSubset = 0, initialized = 0}, htmlDefaultSAXHandler = {
    internalSubset = 0, isStandalone = 0, hasInternalSubset = 0,
    hasExternalSubset = 0, resolveEntity = 0, getEntity = 0, entityDecl = 0,
    notationDecl = 0, attributeDecl = 0, elementDecl = 0,
    unparsedEntityDecl = 0, setDocumentLocator = 0, startDocument = 0,
    endDocument = 0, startElement = 0, endElement = 0, reference = 0,
    characters = 0, ignorableWhitespace = 0, processingInstruction = 0,
    comment = 0, warning = 0, error = 0, fatalError = 0,
    getParameterEntity = 0, cdataBlock = 0, externalSubset = 0,
    initialized = 0}, xmlFree = 0x8091420 <free>,
  xmlMalloc = 0x8090ca0 <malloc>, xmlMemStrdup = 0x821cb70 <xmlStrdup>,
  xmlRealloc = 0x8091130 <realloc>,
  xmlGenericError = 0x81b334e <xml_generic_error_handler>,
  xmlStructuredError = 0x81b33ce <xml_structured_error_handler>,
  xmlGenericErrorContext = 0xb7b94584, oldXMLWDcompatibility = 0,
  xmlBufferAllocScheme = XML_BUFFER_ALLOC_EXACT, xmlDefaultBufferSize = 4096,
  xmlSubstituteEntitiesDefaultValue = 0,
  xmlDoValidityCheckingDefaultValue = 0, xmlGetWarningsDefaultValue = 1,
  xmlKeepBlanksDefaultValue = 1, xmlLineNumbersDefaultValue = 0,
  xmlLoadExtDtdDefaultValue = 0, xmlParserDebugEntities = 0,
  xmlPedanticParserDefaultValue = 0, xmlSaveNoEmptyTags = 0,
  xmlIndentTreeOutput = 1, xmlTreeIndentString = 0x849a790 "  ",
  xmlRegisterNodeDefaultValue = 0, xmlDeregisterNodeDefaultValue = 0,
  xmlMallocAtomic = 0x8090ca0 <malloc>, xmlLastError = {domain = 0, code = 0,
    message = 0x0, level = XML_ERR_NONE, file = 0x0, line = 0, str1 = 0x0,
    str2 = 0x0, str3 = 0x0, int1 = 0, int2 = 0, ctxt = 0x0, node = 0x0},
  xmlParserInputBufferCreateFilenameValue = 0,
  xmlOutputBufferCreateFilenameValue = 0}

To compare that with the static version of htmlDefaultSAXHandler:
(gdb) print htmlDefaultSAXHandler
$3 = {internalSubset = 0x821ecf0 <xmlSAX2InternalSubset>, isStandalone = 0,
  hasInternalSubset = 0, hasExternalSubset = 0, resolveEntity = 0,
  getEntity = 0x821f040 <xmlSAX2GetEntity>, entityDecl = 0, notationDecl = 0,
  attributeDecl = 0, elementDecl = 0, unparsedEntityDecl = 0,
  setDocumentLocator = 0x821f7b0 <xmlSAX2SetDocumentLocator>,
  startDocument = 0x821f7c0 <xmlSAX2StartDocument>,
  endDocument = 0x821f8c0 <xmlSAX2EndDocument>,
  startElement = 0x821ffb0 <xmlSAX2StartElement>,
  endElement = 0x8220710 <xmlSAX2EndElement>, reference = 0,
  characters = 0x82211a0 <xmlSAX2Characters>,
  ignorableWhitespace = 0x82213e0 <xmlSAX2IgnorableWhitespace>,
  processingInstruction = 0x82213f0 <xmlSAX2ProcessingInstruction>,
  comment = 0x8221520 <xmlSAX2Comment>,
  warning = 0x81d4340 <xmlParserWarning>, error = 0x81d41b0 <xmlParserError>,
  fatalError = 0x81d41b0 <xmlParserError>, getParameterEntity = 0,
  cdataBlock = 0x8221650 <xmlSAX2CDataBlock>, externalSubset = 0,
  initialized = 1}

So after the call to memcpy, *ctxt->sax has been mostly cleared (as much of it
as is defined by xmlSAXHandlerV1 anyway):
(gdb) print *ctxt->sax
$11 = {internalSubset = 0, isStandalone = 0, hasInternalSubset = 0,
  hasExternalSubset = 0, resolveEntity = 0, getEntity = 0, entityDecl = 0,
  notationDecl = 0, attributeDecl = 0, elementDecl = 0,
  unparsedEntityDecl = 0, setDocumentLocator = 0, startDocument = 0,
  endDocument = 0, startElement = 0, endElement = 0, reference = 0,
  characters = 0, ignorableWhitespace = 0, processingInstruction = 0,
  comment = 0, warning = 0, error = 0, fatalError = 0, getParameterEntity = 0,
  cdataBlock = 0, externalSubset = 0, initialized = 0, _private = 0x0,
  startElementNs = 0x82209c0 <xmlSAX2StartElementNs>,
  endElementNs = 0x82210b0 <xmlSAX2EndElementNs>, serror = 0}

With the change of the #ifdef for inithtmlDefaultSAXHandler in revision 1.46 of
globals.c, this area of the per-thread information is copied from the static
version, and the SAX pointers are present.

The effect is that the buffer fails to be parsed, due to lack of SAX callbacks.
Comment 1 Daniel Veillard 2006-10-13 22:08:13 UTC
I tried to look at this, I'm a bit lost: 
First you can't remove the LEGACY ifdef because if not compiled
with legacy, the inithtmlDefaultSAXHandler just doesn't exist !
My understanding is that:
  - your program is multithreaded (xmllint isn't so 
    xmllint --html doesn't show the problem)
  - your program doesn't call xmlInitParser() (it should
    see the page about thread support)
  - xmlInitParser() calls htmlDefaultSAXHandlerInit () which
    calls xmlSAX2InitHtmlDefaultSAXHandler() which sets up
    the default handler.

But honnestly without a test case reproducing the problem I could not
really understand and fix it. Note that *all* the default SAXv1
handlers are not setup in xmlInitializeGlobalState() if LIBXML_LEGACY_ENABLED
is not defined. It's not specific to the HTML parser ...

Daniel
Comment 2 André Klapper 2008-07-19 11:20:14 UTC
Closing this bug report as no further information has been provided. Please feel free to reopen this bug if you can provide the information asked for.
Thanks!