After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 617801 - [Win32] Copy/Paste crashes Gnumeric
[Win32] Copy/Paste crashes Gnumeric
Status: RESOLVED FIXED
Product: Gnumeric
Classification: Applications
Component: import/export HTML
1.10.x
Other Windows
: Normal normal
: ---
Assigned To: Andreas J. Guelzow
Jody Goldberg
: 622610 (view as bug list)
Depends on:
Blocks:
 
 
Reported: 2010-05-05 19:12 UTC by Pete
Modified: 2010-06-26 00:20 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Debug log (150.30 KB, text/plain)
2010-06-25 02:39 UTC, Morten Welinder
Details
Emacs lisp emulator (182.41 KB, text/plain)
2010-06-25 03:34 UTC, Morten Welinder
Details

Description Pete 2010-05-05 19:12:07 UTC
A simple copy/paste from my broker's web site of the TSX index into a gnumeric sheet hangs or crashes the software.  The problem is reproducible and happens every attempt.
Comment 1 Morten Welinder 2010-05-05 19:34:59 UTC
You're going to have to give us some info so we can reproduce this.
Otherwise we will not be able to work on it.

If you could paste the relevant text into Excel and show us what the
data is, that would be a good start.  Better yet: give us a link where
we can try ourselves.
Comment 2 Andreas J. Guelzow 2010-05-06 03:40:38 UTC
I was able to replicate with GNumeric 1.20.2 on MS Windows XP:

Use IE to open http://ca.finance.yahoo.com/q/hp?s=^GSPTSE
Select the data table, starting with "Prices" and ending with the last row of numbers.
Copy
Paste into Gnumeric.
Gnumeric appears to freeze and doesn't even close the Edit menu.

Note that this was the first page I tried to copy/paste.
Comment 3 Morten Welinder 2010-05-06 12:43:58 UTC
Any output from gnumeric if you start it from the command line and
first do

    set GNM_DEBUG=clipboard

?
Comment 4 Morten Welinder 2010-05-06 20:09:46 UTC
received offline.  The relevant part is this:

Clipboard target 0 is DataObject
Clipboard target 1 is text/html
Clipboard target 2 is HTML Format
Clipboard target 3 is text/_moz_htmlcontext
Clipboard target 4 is text/_moz_htmlinfo
Clipboard target 5 is UTF8_STRING
Clipboard target 6 is text/x-moz-url-priv
Clipboard target 7 is Ole Private Data

If I read that right, we should be using text/html based on this.
Comment 5 Andreas J. Guelzow 2010-05-06 20:19:51 UTC
Is there a way of seeing what we are really receiving as text/html. According to bug #143084 it has additional info preceding the html.
Comment 6 Andreas J. Guelzow 2010-05-06 20:24:22 UTC
We also ned to know whether the html plugin is in fact installed and enabled. You can see that in tools->PlugIns. The required plugin is called HTML&TeX. 

Note that bug #609236 indicates that not all file filters are installed by default.
Comment 7 Andreas J. Guelzow 2010-05-06 21:35:49 UTC
According to the reporter: "Yes, the HTML&TeX option is enabled by installation default."
Comment 8 Morten Welinder 2010-05-11 01:05:35 UTC
I can't reproduce, but I have added more debug code.
Comment 9 Andreas J. Guelzow 2010-05-12 22:25:10 UTC
Morten, I don't know how to compile Gnuemric for Windows, so I can try it out with the debug code. I did notice one thing though:

Gnumeric does not crash or hang when I copy the table int eh webpage from a Firefox window into Gnumeric. Gnumeric does hang if I copy it from an Internet Explorer window.

So the obvious workaround is to use Firefox as browser.
Comment 10 Pete 2010-05-13 01:55:33 UTC
Andreas,  Firefox was the browser used to make the initial report of the bug.
Comment 11 Hoyt 2010-06-03 18:59:36 UTC
Andreas, in Firefox, the problematic code from the website you mentioned (http://ca.finance.yahoo.com/q/hp?s=^GSPTSE) is this:

</td></tr></table>
<table cellpadding="0" cellspacing="0" border="0"><tr>
        <td height="2" nowrap="nowrap"><spacer type="block" width="1" height="2"/></td>
        </tr></table>
<table width="100%" cellpadding="2" cellspacing="0" border="0"><tr class="yfnc_modtitle1" valign="top"><td><small><b>PRICES</b></small></td></tr></table>


Notice there is a non-standard (error) in the line:

height="2"/></td>

The orphaned backslash after "2" may be the culprit.
Comment 12 Hoyt 2010-06-03 19:22:55 UTC
(In reply to comment #11)
> <snip>
> 
> The orphaned backslash after "2" may be the culprit.

Nevermind. I was trying to help but after more testing, the problem is not in that code specifically. I copied the page source into Kompozer and from the preview page of Kompozer, I could reproduce the problem. You can copy paste the table below "Prices" and even include "Prices". But after deleting and deleting code from Kompozer, I eventually got to:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <meta content="text/html; charset=ISO-8859-1"
 http-equiv="content-type">
  <title></title>
  <meta content="Hoyt" name="author">
</head>
<body>
test
</body>
</html>

And even this could not be copy-pasted into a gnumeric cell from Kompozer preview. I could paste the word "test" into the cell description window (or whatever you call the editing window for each cell next to the "+" sign).
Comment 13 Andreas J. Guelzow 2010-06-03 21:39:54 UTC
which "+" sign are you talking about?
Comment 14 Andreas J. Guelzow 2010-06-03 21:43:32 UTC
unfortunately I still can't replicate on Linux.

Do you get any more information with 
set GNM_DEBUG=clipboard
and a newer version of Gnumeric?
Comment 15 Hoyt 2010-06-06 01:24:39 UTC
I meant the "=" sign.
Some info (no solutions):
Problem is replicated on another computer with a fresh install of Gnumeric.
Using a freshly made HTML page from KompoZer that consists of:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <meta content="text/html; charset=ISO-8859-1"
 http-equiv="content-type">
  <title>test</title>
</head>
<body>
testr<br>
<br>
</body>
</html>

Could not copy the word "test" from KompoZer (In Preview mode) to Gnumeric without a hang. I could not copy the letters "es" from within the word "test" without Gnumeric hanging. 

In Gnumeric, if I went into EDIT > PREFERENCES, and unclicked the box "Prefer CLIPBOARD Over PRIMARY Selection" you still cannot paste into a cell, but Gnumeric no longer hangs. You can paste into the Formula line next to the "=" sign, with no problem. 

Also, you can copy the word "test" from KompoZer to a text editor (I used Metapad), and then immediately copy and paste it into Gnumeric cell with no problems. 
I will try Gnumeric on my CentOS5.4 build later to see if the Linux build does not have this problem.
Comment 16 Morten Welinder 2010-06-24 12:42:46 UTC
The "Prefer CLIPBOARD Over PRIMARY Selection" setting isn't relevant for
Win32.  Changing it probably disables the structured-text import.
Comment 17 Morten Welinder 2010-06-24 15:04:30 UTC
*** Bug 622610 has been marked as a duplicate of this bug. ***
Comment 18 Andreas J. Guelzow 2010-06-24 17:35:57 UTC
I still suspect that what we are receiving when pasted is not the straight html code but likely some additional text at teh beginning that causes this crash. We need to see what gnumeric really receives.
Comment 19 Morten Welinder 2010-06-25 02:39:24 UTC
Created attachment 164576 [details]
Debug log

Reproduced.

We learn here:

1. The target is "HTML Format"
2. The target and it is being recognized as a "table".  (Due to a
   bug in the debug code the entire object is printed in the log.
   Sometimes bugs are good for something!)
3. Encoding is ascii or similar, or else something already decoded
   it by the time we get our hands on it.
4. The are initial headers of the form "<name>:<value><CRLF>"  I do
   not see any explicit end marker for these headers.
5. The value for that StartHTML header is the byte offset where
   the actual HTML starts.  The value for StartFragment is the
   byte offset where the actual relevant table starts.
Comment 20 Morten Welinder 2010-06-25 03:34:43 UTC
Created attachment 164577 [details]
Emacs lisp emulator

To emulate the same clipboard setting, evaluate this file
in Emacs.

I don't get a crash under Linux, but I do get parse errors and
everything gets put into one line.

HTML parser error : Misplaced DOCTYPE declaration
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
^
HTML parser error : htmlParseStartTag: misplaced <html> tag
<HTML><HEAD><TITLE>AKAM: Historical Prices for Akamai Technologies, Inc. - Yahoo
     ^
HTML parser error : htmlParseStartTag: misplaced <head> tag
<HTML><HEAD><TITLE>AKAM: Historical Prices for Akamai Technologies, Inc. - Yahoo
           ^
HTML parser error : htmlParseStartTag: misplaced <body> tag
<BODY>
     ^
Comment 21 Andreas J. Guelzow 2010-06-25 04:57:17 UTC
These seem to be the headers mentioned by Jon-Kare in bug #143084.
Comment 22 Morten Welinder 2010-06-25 13:32:30 UTC
We now skip the offending HTML headers and the scary warnings are gone.
I don't know if that improves the win32 situation, but I don't see why
it would.

Andreas: things get pasted into a single cell.  Somehow this table is
not parsed right.
Comment 23 Andreas J. Guelzow 2010-06-25 16:54:28 UTC
Which table? There is a table inside a table. The outer table (which we parse) has a single cell. We don't handle the inner table (where should it go?)

This is bug #594789.

I am not marking this as a duplicate of #594789 since we don't know the cause of the crash in Windows.
Comment 24 Morten Welinder 2010-06-26 00:15:05 UTC
This problem has been fixed in our software repository. The fix will go into the next software release. Thank you for your bug report.
Comment 25 Morten Welinder 2010-06-26 00:20:13 UTC
Specifically we now paste the table correctly and see no crash.
Most people will take that and smile.

I still don't know what caused the crash, but I doubt Gnumeric is to blame.