GNOME Bugzilla – Bug 139931
Potential bug in regexp module - access to freed memory
Last modified: 2021-07-05 13:22:38 UTC
I found that regexp module has access to memory that was previously freed in xmlRegFreeState(xmlRegStatePtr state). I modified this function to demonstrate this potential bug as follows: static void xmlRegFreeState(xmlRegStatePtr state) { if (state == NULL) return; if (state->trans != NULL) { memset (state->trans, 0x7f, sizeof (*state->trans)); xmlFree(state->trans); } memset (state, 0x7f, sizeof (*state)); xmlFree(state); } I added memsets before xmlFree. Now compile and run testRegexp: testRegexp.exe -debug "([a-zA-Z0-9]|_|\+|\?|=|;){10,}" aaaaaaaaaa Testing ([a-zA-Z0-9]|_|\+|\?|=|;){10,}: regexp: '([a-zA-Z0-9]|_|\+|\?|=|;){10,}' 7 atoms: 00 atom: ranges once 3 entries range: charval a - z range: charval A - Z range: charval 0 - 9 01 atom: charval once char _ 02 atom: charval once char + 03 atom: charval once char ? 04 atom: charval once char = 05 atom: charval once char ; 06 atom: subexpr once start 2139062143 end 2 9 states: state: START 0, 7 transitions: trans: removed trans: atom 0, to 2 trans: char _ atom 1, to 3 trans: char + atom 2, to 4 trans: char ? atom 3, to 5 trans: char = atom 4, to 6 trans: char ; atom 5, to 7 state: NULL state: 2, 8 transitions: trans: removed trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: 3, 14 transitions: trans: removed trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: 4, 14 transitions: trans: removed trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: 5, 14 transitions: trans: removed trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: 6, 14 transitions: trans: removed trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: 7, 14 transitions: trans: removed trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 trans: count based 0, epsilon to 8 trans: counted 0, atom 0, to 2 trans: counted 0, char _ atom 1, to 3 trans: counted 0, char + atom 2, to 4 trans: counted 0, char ? atom 3, to 5 trans: counted 0, char = atom 4, to 6 trans: counted 0, char ; atom 5, to 7 state: FINAL 8, 0 transitions: 1 counters: 0: min 9 max 123456788 aaaaaaaaaa: Ok Pay attention to atom 06: subexpr once start 2139062143 end 2. The "start" value 2139062143 is gained from freed state. It may cause crash someday. Though, currently it does work regardless to this fact.
Running in the Linux OS, using Valgrind, confirms the problem: 05 atom: charval once char ; ==7273== Invalid read of size 4 ==7273== at 0x80984F2: xmlRegPrintAtom (xmlregexp.c:976) ==7273== by 0x809C06A: xmlRegexpPrint (xmlregexp.c:3972) ==7273== by 0x8049DEB: main (testRegexp.c:137) ==7273== by 0x402C1DCA: __libc_start_main (in /lib/libc-2.3.2.so) ==7273== Address 0x4134126C is 44 bytes inside a block of size 60 free'd ==7273== at 0x4002AD5B: free (in /usr/lib/valgrind/vgskin_memcheck.so) ==7273== by 0x8067D61: xmlMemFree (xmlmemory.c:422) ==7273== by 0x8097AC5: xmlRegFreeState (xmlregexp.c:775) ==7273== by 0x80993A2: xmlFAEliminateEpsilonTransitions (xmlregexp.c:1656) 06 atom: subexpr once start -1 end 2 I'm not sure yet whether this is just a debug/print problem, or whether the main logic is also affected. In any event, it will (eventually) be fixed....
I have traced it further, and have determined that the only "bug" is in the fact that the debug-trace switch tries to print the details of an 'atom' which has been "orphaned" because the transition using it has been removed. I'll grant the fact that this shouldn't be allowed to occur, but the only easy solution is to inhibit the attempt to print details of atoms. If you are interested in pursuing how this happens, try tracing through a subcase of your reported problem: testRegexp "(a)" a which exhibits the same "problem". Basically this automata begins with three states , two atoms and two transitions. During processing, it is determined that the second state (which involves an epsilon transition) can be replaced by a single transition from the first state to the third one, and the middle (second) state can be eliminated. However, this elimination "orphans" the second atom, and when the "debug print" is attempted an illegal memory reference (a pointer to the removed transition) occurs. I'll continue looking into the possibility of "cleaning up" the debug print, but I am revising this bug status to be a request for an "enhancement", since its priority must be relatively low. Thanks for the report. Bill
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new ticket at https://gitlab.gnome.org/GNOME/libxml2/-/issues/ Thank you for your understanding and your help.