GNOME Bugzilla – Bug 617801
[Win32] Copy/Paste crashes Gnumeric
Last modified: 2010-06-26 00:20:13 UTC
A simple copy/paste from my broker's web site of the TSX index into a gnumeric sheet hangs or crashes the software. The problem is reproducible and happens every attempt.
You're going to have to give us some info so we can reproduce this. Otherwise we will not be able to work on it. If you could paste the relevant text into Excel and show us what the data is, that would be a good start. Better yet: give us a link where we can try ourselves.
I was able to replicate with GNumeric 1.20.2 on MS Windows XP: Use IE to open http://ca.finance.yahoo.com/q/hp?s=^GSPTSE Select the data table, starting with "Prices" and ending with the last row of numbers. Copy Paste into Gnumeric. Gnumeric appears to freeze and doesn't even close the Edit menu. Note that this was the first page I tried to copy/paste.
Any output from gnumeric if you start it from the command line and first do set GNM_DEBUG=clipboard ?
received offline. The relevant part is this: Clipboard target 0 is DataObject Clipboard target 1 is text/html Clipboard target 2 is HTML Format Clipboard target 3 is text/_moz_htmlcontext Clipboard target 4 is text/_moz_htmlinfo Clipboard target 5 is UTF8_STRING Clipboard target 6 is text/x-moz-url-priv Clipboard target 7 is Ole Private Data If I read that right, we should be using text/html based on this.
Is there a way of seeing what we are really receiving as text/html. According to bug #143084 it has additional info preceding the html.
We also ned to know whether the html plugin is in fact installed and enabled. You can see that in tools->PlugIns. The required plugin is called HTML&TeX. Note that bug #609236 indicates that not all file filters are installed by default.
According to the reporter: "Yes, the HTML&TeX option is enabled by installation default."
I can't reproduce, but I have added more debug code.
Morten, I don't know how to compile Gnuemric for Windows, so I can try it out with the debug code. I did notice one thing though: Gnumeric does not crash or hang when I copy the table int eh webpage from a Firefox window into Gnumeric. Gnumeric does hang if I copy it from an Internet Explorer window. So the obvious workaround is to use Firefox as browser.
Andreas, Firefox was the browser used to make the initial report of the bug.
Andreas, in Firefox, the problematic code from the website you mentioned (http://ca.finance.yahoo.com/q/hp?s=^GSPTSE) is this: </td></tr></table> <table cellpadding="0" cellspacing="0" border="0"><tr> <td height="2" nowrap="nowrap"><spacer type="block" width="1" height="2"/></td> </tr></table> <table width="100%" cellpadding="2" cellspacing="0" border="0"><tr class="yfnc_modtitle1" valign="top"><td><small><b>PRICES</b></small></td></tr></table> Notice there is a non-standard (error) in the line: height="2"/></td> The orphaned backslash after "2" may be the culprit.
(In reply to comment #11) > <snip> > > The orphaned backslash after "2" may be the culprit. Nevermind. I was trying to help but after more testing, the problem is not in that code specifically. I copied the page source into Kompozer and from the preview page of Kompozer, I could reproduce the problem. You can copy paste the table below "Prices" and even include "Prices". But after deleting and deleting code from Kompozer, I eventually got to: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"> <title></title> <meta content="Hoyt" name="author"> </head> <body> test </body> </html> And even this could not be copy-pasted into a gnumeric cell from Kompozer preview. I could paste the word "test" into the cell description window (or whatever you call the editing window for each cell next to the "+" sign).
which "+" sign are you talking about?
unfortunately I still can't replicate on Linux. Do you get any more information with set GNM_DEBUG=clipboard and a newer version of Gnumeric?
I meant the "=" sign. Some info (no solutions): Problem is replicated on another computer with a fresh install of Gnumeric. Using a freshly made HTML page from KompoZer that consists of: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type"> <title>test</title> </head> <body> testr<br> <br> </body> </html> Could not copy the word "test" from KompoZer (In Preview mode) to Gnumeric without a hang. I could not copy the letters "es" from within the word "test" without Gnumeric hanging. In Gnumeric, if I went into EDIT > PREFERENCES, and unclicked the box "Prefer CLIPBOARD Over PRIMARY Selection" you still cannot paste into a cell, but Gnumeric no longer hangs. You can paste into the Formula line next to the "=" sign, with no problem. Also, you can copy the word "test" from KompoZer to a text editor (I used Metapad), and then immediately copy and paste it into Gnumeric cell with no problems. I will try Gnumeric on my CentOS5.4 build later to see if the Linux build does not have this problem.
The "Prefer CLIPBOARD Over PRIMARY Selection" setting isn't relevant for Win32. Changing it probably disables the structured-text import.
*** Bug 622610 has been marked as a duplicate of this bug. ***
I still suspect that what we are receiving when pasted is not the straight html code but likely some additional text at teh beginning that causes this crash. We need to see what gnumeric really receives.
Created attachment 164576 [details] Debug log Reproduced. We learn here: 1. The target is "HTML Format" 2. The target and it is being recognized as a "table". (Due to a bug in the debug code the entire object is printed in the log. Sometimes bugs are good for something!) 3. Encoding is ascii or similar, or else something already decoded it by the time we get our hands on it. 4. The are initial headers of the form "<name>:<value><CRLF>" I do not see any explicit end marker for these headers. 5. The value for that StartHTML header is the byte offset where the actual HTML starts. The value for StartFragment is the byte offset where the actual relevant table starts.
Created attachment 164577 [details] Emacs lisp emulator To emulate the same clipboard setting, evaluate this file in Emacs. I don't get a crash under Linux, but I do get parse errors and everything gets put into one line. HTML parser error : Misplaced DOCTYPE declaration <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> ^ HTML parser error : htmlParseStartTag: misplaced <html> tag <HTML><HEAD><TITLE>AKAM: Historical Prices for Akamai Technologies, Inc. - Yahoo ^ HTML parser error : htmlParseStartTag: misplaced <head> tag <HTML><HEAD><TITLE>AKAM: Historical Prices for Akamai Technologies, Inc. - Yahoo ^ HTML parser error : htmlParseStartTag: misplaced <body> tag <BODY> ^
These seem to be the headers mentioned by Jon-Kare in bug #143084.
We now skip the offending HTML headers and the scary warnings are gone. I don't know if that improves the win32 situation, but I don't see why it would. Andreas: things get pasted into a single cell. Somehow this table is not parsed right.
Which table? There is a table inside a table. The outer table (which we parse) has a single cell. We don't handle the inner table (where should it go?) This is bug #594789. I am not marking this as a duplicate of #594789 since we don't know the cause of the crash in Windows.
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
Specifically we now paste the table correctly and see no crash. Most people will take that and smile. I still don't know what caused the crash, but I doubt Gnumeric is to blame.