GNOME Bugzilla – Bug 319716
HTMLparser bug with space around <img> tags
Last modified: 2017-06-17 10:50:08 UTC
HTMLparser removes space between elements in some situations like this: <p><img src="foo"> <img src="bar"></p> so that the output will be like this: <p><img src="foo"><img src="bar"></p> However, this does not seem to be correct; at least it is not the way that web browsers parse this kind of HTML. (If <span> elements are used instead of <img> elements then HTMLparser correctly preserves the space).
There is an heuristic in areBlanks() which is called when blank strings have been parsed and checking if need to be ignored or not: lastChild = xmlGetLastChild(ctxt->node); while ((lastChild) && (lastChild->type == XML_COMMENT_NODE)) lastChild = lastChild->prev; if (lastChild == NULL) { if ((ctxt->node->type != XML_ELEMENT_NODE) && (ctxt->node->content != NULL)) return(0); /* keep ws in constructs like ...<b> </b>... for all tags "b" allowing PCDATA */ for ( i = 0; i < sizeof(allowPCData)/sizeof(allowPCData[0]); i++ ) { if ( xmlStrEqual(ctxt->name, BAD_CAST allowPCData[i]) ) { return(0); } } } else if (xmlNodeIsText(lastChild)) { return(0); } else { /* keep ws in constructs like <p><b>xy</b> <i>z</i><p> for all tags "p" allowing PCDATA */ for ( i = 0; i < sizeof(allowPCData)/sizeof(allowPCData[0]); i++ ) { if ( xmlStrEqual(lastChild->name, BAD_CAST allowPCData[i]) ) { return(0); } } } return(1); I have no idea where this comes from, and what the theorical or practical behaviour should be. It seems that theorical one should be just to check for ctxt->name ("p" in this case) in allowPCData, and if yes then return 0. But I have no idea where this come from and I don't want to change this unilateraly, that should be discussed in the mailing-list I guess. Daniel
<body><a>a</a> <b>b</b></body> results in (wrong): <body><a>a</a><b>b</b></body> while, <p><a>a</a> <b>b</b></p> results in (ok): <p><a>a</a> <b>b</b></p> Looks like there is an extra check to do, or include "body" to the allowPCData array. PHP, libxml Version 2.6.26
*** This bug has been marked as a duplicate of bug 681822 ***