GNOME Bugzilla – Bug 687926
EIGEN fails to find eigenvalues/eigenvectors
Last modified: 2013-01-09 16:05:06 UTC
Created attachment 228477 [details] dataset on which PCA fails The menu item Statistics --> Dependent Observations -->Principal Components Analysis seems to fail on some datasets. The covariance matrix is fine, but all eigenvectors and eigenvalues are "#NUM!". Please find attached a dataset on which it fails (grouped by columns).
The problem is really that EIGEN does not find any eigenvalues/eigenvectors of the symmetric covariance matrix although the determinant is 0. The principal component analysis is only involved since it uses the EIGEN function.
I have asked Maxima to find the eigenvalues for the matrix in question. After about an hour on a pretty speedy machine, I still have no answer.
Maxima does not even give an answer after 3 hours. If I try this in Mathematica I get some reasonably sized eigenvalues but then also a bunch of distinct eigenvalues in the 10^-16 to 10^18 range.
I extracted the gnumeric source code for computing eigenvalues and ran it separately so I could see what was going on; it did indeed give up after 400000 iterations. I then looked at matrix and it seemed to be already diagonalized (at least the upper-right hand half; the loop seemed to be ignoring the other half, since of course it's a duplicate). So I assume what's going on is that the matrix is singular and round-off error introduces some spurious tiny non-zero eigenvalues which are really 0, but then it never converges. In any event, I was able to solve it by patching plugins/fn-math/functions.c:gnumeric_eigen() to ignore the return value of gnm_matrix_eigen(). This way I get the meaningful eigen{vectors,values} quickly, followed by some obviously roundoff-error-induced garbage at the end. I don't know if you want to incorporate this in the official version; it depends whether you want to give an imprecise answer or no answer. But for PCA an imprecise answer is (to my eyes, at least) much better, since the whole point is that the first few eigenvectors are the important ones and the others are probably noise.
In comment #3 that should have been "10^-16 to 10^-18 range". Elsewhere we have successfully managed to limit round-off errors, so we should probably see what we can do here.
Just to confirm, octave also produces 17 non-zero eigenvalues and 13 values smaller than 10^-16 (which should be 0). I don't think you can eliminate round-off errors there unless you work symbolically like maxima. I wouldn't ignore the return value from gnm_matrix_eigen() but rather make sure gnm_matrix_eigen() can handle rank-deficient matrices well (maybe compare how octave/gsl does it?). Also, just looking at the example data, 0-eigenvalues are entirely expected. You only have 20 observations for each of the 30 different random inputs so the rank of the covariance matrix can be 20 (or maybe 19) at most. The actual rank is 17 (there are loads of identical cols, eg A=F=K=Q=S=U=-V=W) so you only get 17 non-zero eigenvalues.
What is happening is that the pivot element underflows to zero. Thereafter, nothing is going to change. I have put in a test for that, but I am pretty sure my simple "break;" isn't the right action for that. Andreas?
I don't really see what we can do there.
Over in goffice we have QR factorization. I think we should be using that for eigen vectors and values. A=QR, where Q is orthogonal and R is upper triangle. Ax=ex i.e., QRx=ex i.e., Rx=Q^t (ex) i.e., Rx=e(Q^tx) so the eigenvalues of A are the same as those of R and the eigenvectors are just different by a multiplication of Q.
> so the eigenvalues of A are the same as those of R and the eigenvectors > are just different by a multiplication of Q. Sigh. That's just wrong.
http://mathreview.uwaterloo.ca/archive/voli/1/panju.pdf describes an iterative method to obtain eigenvalues and eigenvectors using the QR-decomposition, but as hey point out it requires refinements for any practical use.
Yeah, it's complicated. The Jacobi method we now use is praised for accuracy even if it can be slow. The break-at-zero I added is correct. So I think we're good for now.
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.