Bug 687926 – EIGEN fails to find eigenvalues/eigenvectors

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 687926 - EIGEN fails to find eigenvalues/eigenvectors


Summary:	EIGEN fails to find eigenvalues/eigenvectors


Status:	RESOLVED FIXED

Product:	Gnumeric
Classification:	Applications
Component:	Analytics
Version:	1.10.x
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Morten Welinder
QA Contact:	Jody Goldberg

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2012-11-08 14:53 UTC by Andrew Warshall
Modified:	2013-01-09 16:05 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
dataset on which PCA fails (4.67 KB, application/octet-stream) 2012-11-08 14:53 UTC, Andrew Warshall	Details

Description Andrew Warshall 2012-11-08 14:53:07 UTC

Created attachment 228477 [details]
dataset on which PCA fails

The menu item Statistics --> Dependent Observations -->Principal Components Analysis seems to fail on some datasets. The covariance matrix is fine, but all eigenvectors and eigenvalues are "#NUM!". Please find attached a dataset on which it fails (grouped by columns).

Comment 1 Andreas J. Guelzow 2012-11-08 15:44:13 UTC

The problem is really that EIGEN does not find any eigenvalues/eigenvectors of the symmetric covariance matrix although the determinant is 0. The principal component analysis is only involved since it uses the EIGEN function.

Comment 2 Andreas J. Guelzow 2012-11-08 20:26:57 UTC

I have asked Maxima to find the eigenvalues for the matrix in question. After about an hour on a pretty speedy machine, I still have no answer.

Comment 3 Andreas J. Guelzow 2012-11-09 00:31:43 UTC

Maxima does not even give an answer after 3 hours. If I try this in Mathematica I get some reasonably sized eigenvalues but then also a bunch of distinct eigenvalues in the 10^-16 to 10^18 range.

Comment 4 Andrew Warshall 2012-11-09 17:10:03 UTC

I extracted the gnumeric source code for computing eigenvalues and ran it separately so I could see what was going on; it did indeed give up after 400000 iterations. I then looked at matrix and it seemed to be already diagonalized (at least the upper-right hand half; the loop seemed to be ignoring the other half, since of course it's a duplicate). So I assume what's going on is that the matrix is singular and round-off error introduces some spurious tiny non-zero eigenvalues which are really 0, but then it never converges.

In any event, I was able to solve it by patching plugins/fn-math/functions.c:gnumeric_eigen() to ignore the return value of gnm_matrix_eigen(). This way I get the meaningful eigen{vectors,values} quickly, followed by some obviously roundoff-error-induced garbage at the end. I don't know if you want to incorporate this in the official version; it depends whether you want to give an imprecise answer or no answer. But for PCA an imprecise answer is (to my eyes, at least) much better, since the whole point is that the first few eigenvectors are the important ones and the others are probably noise.

Comment 5 Andreas J. Guelzow 2012-11-09 19:52:41 UTC

In comment #3 that should have been "10^-16 to 10^-18 range".

Elsewhere we have successfully managed to limit round-off errors, so we should probably see what we can do here.

Comment 6 Tino 2012-11-09 21:50:57 UTC

Just to confirm, octave also produces 17 non-zero eigenvalues and 13 values
smaller than 10^-16 (which should be 0). I don't think you can eliminate
round-off errors there unless you work symbolically like maxima.
I wouldn't ignore the return value from gnm_matrix_eigen() but rather
make sure gnm_matrix_eigen() can handle rank-deficient matrices well
(maybe compare how octave/gsl does it?).

Also, just looking at the example data, 0-eigenvalues are entirely expected.
You only have 20 observations for each of the 30 different random inputs so
the rank of the covariance matrix can be 20 (or maybe 19) at most. The actual
rank is 17 (there are loads of identical cols, eg A=F=K=Q=S=U=-V=W)
so you only get 17 non-zero eigenvalues.

Comment 7 Morten Welinder 2013-01-08 02:10:23 UTC

What is happening is that the pivot element underflows to zero.  Thereafter,
nothing is going to change.

I have put in a test for that, but I am pretty sure my simple "break;"
isn't the right action for that.

Andreas?

Comment 8 Andreas J. Guelzow 2013-01-08 03:04:53 UTC

I don't really see what we can do there.

Comment 9 Morten Welinder 2013-01-08 14:57:06 UTC

Over in goffice we have QR factorization.  I think we should be using
that for eigen vectors and values.

A=QR, where Q is orthogonal and R is upper triangle.

  Ax=ex
i.e.,
  QRx=ex
i.e.,
   Rx=Q^t (ex)
i.e.,
   Rx=e(Q^tx)

so the eigenvalues of A are the same as those of R and the eigenvectors
are just different by a multiplication of Q.

Comment 10 Morten Welinder 2013-01-09 01:15:59 UTC

> so the eigenvalues of A are the same as those of R and the eigenvectors
> are just different by a multiplication of Q.

Sigh.  That's just wrong.

Comment 11 Andreas J. Guelzow 2013-01-09 01:46:00 UTC

http://mathreview.uwaterloo.ca/archive/voli/1/panju.pdf describes an iterative method to obtain eigenvalues and eigenvectors using the QR-decomposition, but as hey point out it requires refinements for any practical use.

Comment 12 Morten Welinder 2013-01-09 03:57:27 UTC

Yeah, it's complicated.

The Jacobi method we now use is praised for accuracy even if it can be
slow.  The break-at-zero I added is correct.

So I think we're good for now.

Comment 13 Morten Welinder 2013-01-09 16:05:06 UTC

This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.