GNOME Bugzilla – Bug 760183
REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer
Last modified: 2016-04-29 17:30:10 UTC
When the XML push parser encounters a multi-byte UTF-8 character that is split across a buffer, it now thinks there is an encoding error instead of returning characters parsed up to that point, and letting the next parsing pass validate the multi-byte UTF-8 character. This regressed with the fix for Bug 754947 in v2.9.3: Heap-buffer overread in push mode, parser.c xmlParseTryOrFinish <https://bugzilla.gnome.org/show_bug.cgi?id=754947> <https://git.gnome.org/browse/libxml2/commit/?id=4a5d80aded1da94cd55294e7207109712201b75b>
Created attachment 318298 [details] [review] Patch v1 * parser.c: (xmlCheckCdataPush): Add 'complete' argument to describe whether the buffer passed in is the whole CDATA buffer, or if there is more data to parse. If there is more data to parse, don't return a negative value for an invalid multi-byte UTF-8 character that is split between buffers. (xmlParseTryOrFinish): Pass 'complete' argument to xmlCheckCdataPush() as appropriate. * result/cdata-2-byte-UTF-8.xml: Added. * result/cdata-2-byte-UTF-8.xml.rde: Added. * result/cdata-2-byte-UTF-8.xml.rdr: Added. * result/cdata-2-byte-UTF-8.xml.sax: Added. * result/cdata-2-byte-UTF-8.xml.sax2: Added. * result/cdata-3-byte-UTF-8.xml: Added. * result/cdata-3-byte-UTF-8.xml.rde: Added. * result/cdata-3-byte-UTF-8.xml.rdr: Added. * result/cdata-3-byte-UTF-8.xml.sax: Added. * result/cdata-3-byte-UTF-8.xml.sax2: Added. * result/cdata-4-byte-UTF-8.xml: Added. * result/cdata-4-byte-UTF-8.xml.rde: Added. * result/cdata-4-byte-UTF-8.xml.rdr: Added. * result/cdata-4-byte-UTF-8.xml.sax: Added. * result/cdata-4-byte-UTF-8.xml.sax2: Added. * result/noent/cdata-2-byte-UTF-8.xml: Added. * result/noent/cdata-3-byte-UTF-8.xml: Added. * result/noent/cdata-4-byte-UTF-8.xml: Added. * test/cdata-2-byte-UTF-8.xml: Added. * test/cdata-3-byte-UTF-8.xml: Added. * test/cdata-4-byte-UTF-8.xml: Added. - Add tests and results. Only 'make Readertests XMLPushtests' fails prior to the fix.
This bug affects PHP since 5.5.32 and 5.6.18 (see https://bugs.php.net/bug.php?id=71805) and seems to have fallen by the wayside... Any chance of having the patch applied in the near future?
This was fixed in 4f8606c13cb7f2684839f850b83de5ce647d3ca7. <https://git.gnome.org/browse/libxml2/commit/?id=4f8606c13cb7f2684839f850b83de5ce647d3ca7>