After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 139931 - Potential bug in regexp module - access to freed memory
Potential bug in regexp module - access to freed memory
Status: RESOLVED OBSOLETE
Product: libxml2
Classification: Platform
Component: regexp
git master
Other All
: Low enhancement
: ---
Assigned To: Daniel Veillard
libxml QA maintainers
Depends on:
Blocks:
 
 
Reported: 2004-04-13 15:16 UTC by abolgar
Modified: 2021-07-05 13:22 UTC
See Also:
GNOME target: ---
GNOME version: ---



Description abolgar 2004-04-13 15:16:16 UTC
I found that regexp module has access to memory that was previously freed in
xmlRegFreeState(xmlRegStatePtr state). I modified this function to demonstrate
this potential bug as follows:

static void
xmlRegFreeState(xmlRegStatePtr state) {
    if (state == NULL)
	return;

    if (state->trans != NULL) {
	memset (state->trans, 0x7f, sizeof (*state->trans));
	xmlFree(state->trans);
    }	
    memset (state, 0x7f, sizeof (*state));
    xmlFree(state);
}

I added memsets before xmlFree. Now compile and run testRegexp:

testRegexp.exe -debug "([a-zA-Z0-9]|_|\+|\?|=|;){10,}" aaaaaaaaaa

Testing ([a-zA-Z0-9]|_|\+|\?|=|;){10,}:
 regexp: '([a-zA-Z0-9]|_|\+|\?|=|;){10,}' 
7 atoms:
 00  atom: ranges once 3 entries
  range: charval a - z
  range: charval A - Z
  range: charval 0 - 9
 01  atom: charval once char _
 02  atom: charval once char +
 03  atom: charval once char ?
 04  atom: charval once char =
 05  atom: charval once char ;
 06  atom: subexpr once start 2139062143 end 2
9 states:
 state: START 0, 7 transitions:
  trans: removed
  trans: atom 0, to 2
  trans: char _ atom 1, to 3
  trans: char + atom 2, to 4
  trans: char ? atom 3, to 5
  trans: char = atom 4, to 6
  trans: char ; atom 5, to 7
 state: NULL
 state: 2, 8 transitions:
  trans: removed
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: 3, 14 transitions:
  trans: removed
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: 4, 14 transitions:
  trans: removed
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: 5, 14 transitions:
  trans: removed
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: 6, 14 transitions:
  trans: removed
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: 7, 14 transitions:
  trans: removed
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
  trans: count based 0, epsilon to 8
  trans: counted 0, atom 0, to 2
  trans: counted 0, char _ atom 1, to 3
  trans: counted 0, char + atom 2, to 4
  trans: counted 0, char ? atom 3, to 5
  trans: counted 0, char = atom 4, to 6
  trans: counted 0, char ; atom 5, to 7
 state: FINAL 8, 0 transitions:
1 counters:
 0: min 9 max 123456788
aaaaaaaaaa: Ok

Pay attention to atom 06: subexpr once start 2139062143 end 2. The "start" value
2139062143 is gained from freed state. It may cause crash someday. Though,
currently it does work regardless to this fact.
Comment 1 William M. Brack 2004-04-16 07:28:37 UTC
Running in the Linux OS, using Valgrind, confirms the problem:
 05  atom: charval once char ;
==7273== Invalid read of size 4
==7273==    at 0x80984F2: xmlRegPrintAtom (xmlregexp.c:976)
==7273==    by 0x809C06A: xmlRegexpPrint (xmlregexp.c:3972)
==7273==    by 0x8049DEB: main (testRegexp.c:137)
==7273==    by 0x402C1DCA: __libc_start_main (in /lib/libc-2.3.2.so)
==7273==    Address 0x4134126C is 44 bytes inside a block of size 60 free'd
==7273==    at 0x4002AD5B: free (in /usr/lib/valgrind/vgskin_memcheck.so)
==7273==    by 0x8067D61: xmlMemFree (xmlmemory.c:422)
==7273==    by 0x8097AC5: xmlRegFreeState (xmlregexp.c:775)
==7273==    by 0x80993A2: xmlFAEliminateEpsilonTransitions (xmlregexp.c:1656)
 06  atom: subexpr once start -1 end 2

I'm not sure yet whether this is just a debug/print problem, or whether the 
main logic is also affected.  In any event, it will (eventually) be fixed....
Comment 2 William M. Brack 2004-04-18 06:39:49 UTC
I have traced it further, and have determined that the only "bug" is in the 
fact that the debug-trace switch tries to print the details of an 'atom' which 
has been "orphaned" because the transition using it has been removed.  I'll 
grant the fact that this shouldn't be allowed to occur, but the only easy 
solution is to inhibit the attempt to print details of atoms.

If you are interested in pursuing how this happens, try tracing through a 
subcase of your reported problem:
  testRegexp "(a)" a
which exhibits the same "problem".  Basically this automata begins with three 
states , two atoms and two transitions.  During processing, it is determined 
that the second state (which involves an epsilon transition) can be replaced 
by a single transition from the first state to the third one, and the middle 
(second) state can be eliminated.  However, this elimination "orphans" the 
second atom, and when the "debug print" is attempted an illegal memory 
reference (a pointer to the removed transition) occurs.

I'll continue looking into the possibility of "cleaning up" the debug print, 
but I am revising this bug status to be a request for an "enhancement", since 
its priority must be relatively low.

Thanks for the report.

Bill
Comment 3 GNOME Infrastructure Team 2021-07-05 13:22:38 UTC
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org.
As part of that, we are mass-closing older open tickets in bugzilla.gnome.org
which have not seen updates for a longer time (resources are unfortunately
quite limited so not every ticket can get handled).

If you can still reproduce the situation described in this ticket in a recent
and supported software version, then please follow
  https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines
and create a new ticket at
  https://gitlab.gnome.org/GNOME/libxml2/-/issues/

Thank you for your understanding and your help.