GNOME Bugzilla – Bug 644197
Importing a 73 MB CSV file (4+ million lines) failes with memory allocation error
Last modified: 2011-05-01 20:29:09 UTC
2 GB memory on PC. GLIB-error **: gmem.c 157: failed to allocate 50331648 bytes aborting ... 50,331,648 = c. 50 MB Physical memory availability reported as 403 kB, so memory exhaustion is reasonable. But 5 million records, @ 2 fields per record (both numeric) = 10 million. Assuming you use 10 bytes per cell entry = 100 million bytes. Should still not stress 2 GB RAM.
10 bytes/cell doesn't come close to the amount of data we need per cell. It's more like 100 bytes/cell. But that's still only ~1GB. A quick look shows that, on my 64-bit Linux, something happens during parsing that causes Gnumeric to grow to ~2.4G. After parsing, we drop down to something like 1.4G. I'm pretty sure it's not too hard to fix the parsing part. However, Gnumeric is not optimized for sheets this size so it isn't a pleasant experience.
Assuming that the file attached to bug 644189 contains the same information (4000000 rows, 2 columns), Gnumeric seems to handle the data just fine: after loading my machine uses a total of 1.4GB and Gnumeric is still responding fine (well, ctrl-end takes two seconds to get to the end of the data and calculations involving a whole column take a (long) while, but that is expected.)
(In reply to comment #2) > Assuming that the file attached to bug 644189 contains the same information > (4000000 rows, 2 columns), Gnumeric seems to handle the data just fine: after > loading my machine uses a total of 1.4GB and Gnumeric is still responding fine > (well, ctrl-end takes two seconds to get to the end of the data and > calculations involving a whole column take a (long) while, but that is > expected.) It does contain the same information. We tried to get around the dbf import issue by loading into Access 2000, exporting to CSV and reading into Gnumeric. With 2 GB RAM machines we then had the memory error issue.
(In reply to comment #1) > 10 bytes/cell doesn't come close to the amount of data we need per cell. > It's more like 100 bytes/cell. > > But that's still only ~1GB. > > A quick look shows that, on my 64-bit Linux, something happens during > parsing that causes Gnumeric to grow to ~2.4G. After parsing, we drop > down to something like 1.4G. I'm pretty sure it's not too hard to fix > the parsing part. > > However, Gnumeric is not optimized for sheets this size so it isn't a > pleasant experience. We are OK with it not being a pleasant experience! We anticipated that processing would be slow using a spreadsheet, and Gnumeric was the only one (Excel 2010 and OpenOffice were our two alternatives) that could even theoretically accommodate the problem size.
I've checked in a small change that lowers the memory high water mark to 2G. Memory usage for a 32-bit build will be slightly lower, but since this puts severe constrains on the memory allocator, it's anyone's guess what will happen over on win32. Loading takes 8 minutes. Ctrl-Down, once loaded, takes ~30s. I can't imagine what you want to do with the data that will not drive you crazy.
See also bug 644437. glib is responsible for ~600M.
Created attachment 184172 [details] [review] Patch This possible patch creates our own data structure for the cell set.
Created attachment 184223 [details] [review] Updated patch Less bugs. (Possibly a duplicate due to bugzilla issues.)
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report. (Fix requires yet-unreleased glib.)