GNOME Bugzilla – Bug 110991
gedit crashes on a long language file
Last modified: 2004-12-22 21:47:04 UTC
I'm in the process of creating a php .lang file and it seems gedit crashes on a very long .lang file. Attached are the .lang file and a backtrace of the crash.
Created attachment 15788 [details] php .lang file
Created attachment 15789 [details] backtrace of crash
Here is another backtrace I got running gedit directory from gdb using another long .lang file: [aldug@astrolinux simpleurl-admin]$ gdb gedit GNU gdb Red Hat Linux (5.2.1-4) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) run update_expiration.php Starting program: /usr/local/gnome2/bin/gedit update_expiration.php [New Thread 8192 (LWP 19746)] Program received signal SIGSEGV, Segmentation fault.
+ Trace 36004
Thread 8192 (LWP 19746)
Apparently this is a GLIBC bug due to the huge generated regular expression for "Keywords". Splitting that pattern in 4 patterns fixes the problem. I'll check to see what is causing it though, but we'll surely need a workaround in gtksourcelanguage.c.
Gustavo, Is there a temperary workaround that I can do in the .lang file so I can use php syntax highlighting in gedit? If not, how long do you think it will be until a "fix" is implimented in gktsourceview?
Alex: yes, you can split the keyword-list in 4 separate keyword-lists. Just name them distinctly (i.e. "Keywords 1", "Keywords 2", etc). If you assign them the same style there will be no difference with a unique keyword-list. In fact, I'm thinking the workaround will have to be implemented along this line... transparently of course.
As I suspected it was a GLIBC bug. I have reported the problem to bugs.gnu.org (http://bugs.gnu.org/cgi-bin/gnatsweb.pl?debug=&database=glibc&cmd=view+audit-trail&cmd=view&pr=5006 for interested parties). So, until somebody fixes that, a new glibc is released and it's widely deployed so that we can depend on it :-) I have committed a workaround. The solution is to split the generated regex into subgroups. I.e., instead of: \b\(key1\|key2\|key3\|key4...\) generate: \b\(key1\|key2\)\|\b\(key3\|key4\)\|... Anyway, long keyword lists are strongly discouraged because, - I think it degrades performance, even more than having the same amount of keywords in separate lists (I still have to back this up with performance measurements) - they resulting .lang file is harder to read - they are evil in general ;-) Nevertheless it should not crash anymore. Please reopen if you still have issues.
I'm getting this crash again with long keyword lists. It's probably the regular expression syntax change (we used GNU emacs before, extended POSIX now).
I just committed a temporary "fix" for this crasher. Long keyword lists are truncated at 250 elements automatically now and a warning console message is produced. I also added a note about this in the README file. There is no easy and clean way to solve this with the current highlighting implementation. In the future, when we separate regular expression patterns from tag objects, we will be able to transparently split the keyword lists.