GNOME Bugzilla – Bug 602782
'make check' for 'glib-2.22.2', 'glib-2.20.5' is stuck
Last modified: 2010-05-28 18:19:11 UTC
Created attachment 148353 [details] screen output of 'make check' I've been building 'glib' for year in the framework of my AppsFromScratch - the last successfully built one 'glib-2.20.5'. Today I tried to build 'glib-2.22.2' and its 'make check' is stuck -see the attached screen output. Maybe my wait was too short - two-three minutes; OTOH there were no active processes in 'top'. Output of 'uname -a': " Linux amdam2 2.6.27.37-0.1-default #1 SMP 2009-10-15 14:56:58 +0200 i686 athlon i386 GNU/Linux ", it's SUSE-11.1. If such a check should take several minutes, a message about the wait would be helpful.
Created attachment 148500 [details] output of 'clear;ps auwx | grep qemu | grep -v root > stuck_test_ps_output.txt'
Since 'make check' for 'glib-2.22.2' is stuck I've reverted to 'glib-2.20.5' - it used to build fine, i.e. 'make check' used to work fine. I haven't changed its build environment, but now it appears to be stuck too, though in a different place: " 1422 Cleaning target testing structure in 'testdir_live-g-file'... 1423 deleting 'dir1' 1424 deleting 'subdir' 1425 deleting 'dir-target' 1426 deleting 'dir2' 1427 deleting 'target-file' 1428 deleting 'normal_file-symlink' 1429 deleting 'pattern_file' 1430 deleting 'executable_file' 1431 deleting 'private_file' 1432 deleting 'normal_file2' 1433 deleting 'UTF_pr?i?lis? z' 1434 deleting 'lost_symlink' 1435 deleting 'dir_no-access' 1436 deleting 'dir_no-write' 1437 deleting 'target-file' 1438 deleting 'normal_file' 1439 deleting 'normal_file-symlink' 1440 deleting 'executable_file~' 1441 deleting 'private_file' 1442 deleting 'normal_file2~' 1443 deleting 'UTF_pr?i?lis? z' 1444 deleting 'pattern_file' 1445 deleting 'executable_file' 1446 deleting 'normal_file2' 1447 deleting 'not_exists2' 1448 deleting 'not_exists3' 1449 deleting 'not_exists4' 1450 OK 1451 PASS: live-g-file 1452 TEST: unix-streams... (pid=24067) 1453 /unix-streams/pipe-io-test: OK ". In the attached 'stuck_test_ps_output.txt' file one can see " qemu 24067 0.0 0.0 0 0 pts/5 Z 04:46 0:00 [unix-streams] <defunct> " is the way it should be ? Is it supposed to be this way ?
I am increasing importance since not only the "bleeding edge" is affected by this phenomenon.
The command line used to run 'make check' is actually this: ./make_check.sh 1>make_check.log 2>&1 - the point is screen output redirection to 'make_check.log' file. Sometimes I monitor 'make_check.log' with 'less' - as a different user if it matters. It looks like if I do _not_ monitor 'make_check.log' file, 'make check' isn't stuck. I.e. it was stuck in a different place when I restarted the build, so I killed the build, restarted it again and this time I wasn't monitoring - 'make check' passed. I do not have a lot of statistics on this though, but the original failure of 'glib-2.22.2' also happened when I was monitoring 'make_check.log' with 'less'. I build a lot of targets and I haven't noticed such a problem with them, i.e. I randomly monitor 'make_check.log' files of theirs with 'less' and have no problem.
Can you provide backtraces for the hangs?
(In reply to comment #5) > Can you provide backtraces for the hangs? I can't because I don't know how. And, actually, I was hit by the same problem rebuilding glib-2.20.5. Had to kill the whole thing and start over again.
So, what is the status of this? Does 2.22 or 2.24 works for you now?
(In reply to comment #7) > So, what is the status of this? > > Does 2.22 or 2.24 works for you now? The last one I tried was glib-2.22.5 and occasionally its 'make check' is stuck.
OK, I can confirm this bug for 2.22 but It's fixed in 2.24. Latest affected version -> 2.22
(In reply to comment #9) > OK, I can confirm this bug for 2.22 but It's fixed in 2.24. > > Latest affected version -> 2.22 No, I don't think it's fixed. At the moment it's stuck - glib-2.24.1. The last few lines of 'make check' output: " 1229 Making check in tests 1230 make[3]: Entering directory `/mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1/gthread/tests' 1231 make check-local 1232 make[4]: Entering directory `/mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1/gthread/tests' 1233 TEST: 1bit-mutex... (pid=6056) 1234 /glib/1bit-mutex: OK 1235 PASS: 1bit-mutex 1236 TEST: 1bit-emufutex... (pid=6172) 1237 /glib/1bit-mutex-emufutex: "' snapshot of 'top': " top - 00:49:40 up 1:39, 26 users, load average: 1.43, 2.36, 2.28 Tasks: 219 total, 2 running, 216 sleeping, 0 stopped, 1 zombie Cpu(s): 38.3%us, 2.1%sy, 0.0%ni, 59.4%id, 0.0%wa, 0.2%hi, 0.0%si, 0.0%st Mem: 2007284k total, 1806120k used, 201164k free, 111960k buffers Swap: 4200956k total, 268k used, 4200688k free, 682216k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3839 qemu 20 0 4448 1340 1156 S 0 0.1 0:00.00 sh -c cd /mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1; ./make_check.sh 1>make_check.log 2>&1 3840 qemu 20 0 4448 1380 1172 S 0 0.1 0:00.00 /bin/sh ./make_check.sh 3841 qemu 20 0 3452 1064 792 S 0 0.1 0:00.00 make check 3842 qemu 20 0 3448 1096 816 S 0 0.1 0:00.00 make check-recursive 3843 qemu 20 0 2900 876 752 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 5770 qemu 20 0 2900 512 388 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 5771 qemu 20 0 3448 1096 816 S 0 0.1 0:00.00 make check 5772 qemu 20 0 2900 876 752 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 6037 qemu 20 0 2900 516 388 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 6038 qemu 20 0 3448 1160 816 S 0 0.1 0:00.00 make check 6039 qemu 20 0 3448 1160 816 S 0 0.1 0:00.00 make check-local 6040 qemu 20 0 2896 832 716 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c test -z "1bit-mutex 1bit-emufutex" || ../../glib/gteste 6041 qemu 20 0 3804 896 788 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1/glib/.libs/gtester --verbose 1bit-mutex 1bit-emufutex 6172 qemu 20 0 0 0 0 Z 0 0.0 0:03.87 [1bit-emufutex] <defunct> 8483 qemu 20 0 5312 2396 1516 S 0 0.1 0:00.24 bash 8802 qemu 20 0 5312 1260 376 S 0 0.1 0:00.00 bash 8811 qemu 20 0 5312 1252 368 S 0 0.1 0:00.00 bash 8814 qemu 20 0 18332 14m 2464 S 0 0.7 0:00.58 /usr/bin/perl -w /home/sergei/AppsFromScratch/20100508/bin/build.pl -install_prefix install -make_like -check_wit 8838 qemu 20 0 3464 988 776 S 0 0.0 0:00.04 less build.install.log ".
And now it is stuck this way - the last few lines of 'make check' output: " 2669 Starting 8 threads for concurrent GSlice usage... 2670 fdfcdccageeeffbhcdfbfhddbeeabffdbbdhfddehhffaegcfcffcbdechfgehfafdhaabcddfaddgbgahededeegdfbbcaaaecgccchgfgdebbccfeecfhghbhhfeecheehdhhhghhh 2671 Thread 1: 46035 blocks freed, 3809 blocks not freed 2672 Thread 2: 44165 blocks freed, 5840 blocks not freed 2673 Thread 3: 45769 blocks freed, 4223 blocks not freed 2674 Thread 4: 45236 blocks freed, 5014 blocks not freed 2675 Thread 5: 47477 blocks freed, 2244 blocks not freed 2676 Thread 6: 43340 blocks freed, 6496 blocks not freed 2677 Thread 7: 49224 blocks freed, 902 blocks not freed 2678 Thread 8: 48116 blocks freed, 2110 blocks not freed 2679 PASS: slice-concurrent 2680 PASS: slice-threadinit 2681 PASS: spawn-test 2682 PASS: thread-test 2683 PASS: threadpool-test 2684 PASS: tree-test 2685 PASS: type-test 2686 PASS: unicode-caseconv 2687 PASS: unicode-encoding 2688 PASS: utf8-validate 2689 PASS: utf8-pointer 2690 PASS: uri-test 2691 failed (got 1, expected: 0) 2692 FAIL: regex-test 2693 PASS: run-markup-tests.sh 2694 PASS: run-collate-tests.sh 2695 PASS: run-bookmark-test.sh " - by the way, see a failure above; snapshot of 'top': " top - 00:49:40 up 1:39, 26 users, load average: 1.43, 2.36, 2.28 Tasks: 219 total, 2 running, 216 sleeping, 0 stopped, 1 zombie Cpu(s): 38.3%us, 2.1%sy, 0.0%ni, 59.4%id, 0.0%wa, 0.2%hi, 0.0%si, 0.0%st Mem: 2007284k total, 1806120k used, 201164k free, 111960k buffers Swap: 4200956k total, 268k used, 4200688k free, 682216k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3839 qemu 20 0 4448 1340 1156 S 0 0.1 0:00.00 sh -c cd /mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1; ./make_check.sh 1>make_check.log 2>&1 3840 qemu 20 0 4448 1380 1172 S 0 0.1 0:00.00 /bin/sh ./make_check.sh 3841 qemu 20 0 3452 1064 792 S 0 0.1 0:00.00 make check 3842 qemu 20 0 3448 1096 816 S 0 0.1 0:00.00 make check-recursive 3843 qemu 20 0 2900 876 752 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 5770 qemu 20 0 2900 512 388 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 5771 qemu 20 0 3448 1096 816 S 0 0.1 0:00.00 make check 5772 qemu 20 0 2900 876 752 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 6037 qemu 20 0 2900 516 388 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c fail= failcom='exit 1'; \?for f in x $MAKEFLAGS; do \? 6038 qemu 20 0 3448 1160 816 S 0 0.1 0:00.00 make check 6039 qemu 20 0 3448 1160 816 S 0 0.1 0:00.00 make check-local 6040 qemu 20 0 2896 832 716 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/install/bash-4.1/bin/bash -c test -z "1bit-mutex 1bit-emufutex" || ../../glib/gteste 6041 qemu 20 0 3804 896 788 S 0 0.0 0:00.00 /mnt/sdb8/sergei/AFSWD_debug/build/glib-2.24.1/glib/.libs/gtester --verbose 1bit-mutex 1bit-emufutex 6172 qemu 20 0 0 0 0 Z 0 0.0 0:03.87 [1bit-emufutex] <defunct> 8483 qemu 20 0 5312 2396 1516 S 0 0.1 0:00.24 bash 8802 qemu 20 0 5312 1260 376 S 0 0.1 0:00.00 bash 8811 qemu 20 0 5312 1252 368 S 0 0.1 0:00.00 bash 8814 qemu 20 0 18332 14m 2464 S 0 0.7 0:00.58 /usr/bin/perl -w /home/sergei/AppsFromScratch/20100508/bin/build.pl -install_prefix install -make_like -check_wit 8838 qemu 20 0 3464 988 776 S 0 0.0 0:00.04 less build.install.log ". I'm wondering whether this "[1bit-emufutex] <defunct>" is OK.
Created attachment 161959 [details] [review] [gtester] Don't play mainloop games, avoid hangs Handling spawning a child is really not complicated; run a mainloop until it's both exited and we don't have any more I/O to process. It's very unclear to me why the gtester code thought it necessary to play games with different priorities, iterating the loop manually, etc. (There was some race condition in the old code I'm not going to bother to debug)
In the commit log for the original code, I see: 2007-12-19 20:30:18 Tim Janik <timj@imendio.com> * glib/gtester.c: force child poll loop to abort if waitpid() signaled child exit, eventhough the child's report file descriptor wasn't closed. Which has me a bit mystified; the report descriptor has to close if the child exited...unless, well, it was leaked. Which then, now that I think about it here's the bug - we need to set the pipe file descriptors to be close on exec.
Created attachment 162048 [details] [review] [gtester] Fix hangs in test suite Handling spawning a child is really not complicated; run a mainloop until it's both exited and we don't have any more I/O to process. It's very unclear to me why the gtester code thought it necessary to play games with different priorities, iterating the loop manually, etc. One thing that we do need to cover is ensuring that our log fd doesn't end up being propagated into any child processes, which would cause it not to close, and thus cause a hang. We fix this by setting it close on exec as soon as we get it.
*** This bug has been marked as a duplicate of bug 578295 ***