GNOME Bugzilla – Bug 719822
Parsing text nodes with <cr><lf> line endings leads to additional <lf> in result
Last modified: 2021-07-05 13:23:46 UTC
When text nodes are parsed with <cr> <lf> line endings the line endings in the result are normalized to line endings with <lf> only. If the input buffer of such a text node was read up to the <cr> (<lf> is not yet read in input buffer), then the resulting text in the text node will contain an additional <lf>. I've created a workaround for this i parser.c which works, although not sure if this is the correct way to fix this. In function "xmlParseCharData" at the location <cr> characters are skipped when followed by <lf> character I've made a small change to test if this <cr> character is at the end of the input buffer and as such not skip it but force reading next part in input buffer: if (*in == 0xD) { in++; if (*in == 0xA) { ctxt->input->cur = in; in++; ctxt->input->line++; ctxt->input->col = 1; continue; /* while */ } if(*in == 0x0) { in--; return; } in--; }
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.