After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 678941 - /contenttype/guess test case failure
/contenttype/guess test case failure
Status: RESOLVED FIXED
Product: glib
Classification: Platform
Component: gio
2.33.x
Other Linux
: Normal normal
: ---
Assigned To: gtkdev
gtkdev
Depends on: 674452
Blocks:
 
 
Reported: 2012-06-27 07:47 UTC by Martin Pitt
Modified: 2012-06-28 13:58 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
improve contenttype tests (1.44 KB, patch)
2012-06-27 08:20 UTC, Martin Pitt
none Details | Review
improve contenttype tests (1.30 KB, patch)
2012-06-27 13:47 UTC, Martin Pitt
none Details | Review
Fix /contenttype/guess test (2.21 KB, patch)
2012-06-27 14:35 UTC, Martin Pitt
committed Details | Review

Description Martin Pitt 2012-06-27 07:47:17 UTC
When building glib 2.33.3, on some architectures (happens on the Debian and Ubuntu builders, but I can reproduce on my Panda board) I get a test case failure (after fixing bug 674452):

TEST: contenttype... (pid=11058)
  /contenttype/guess:                                                  FAIL

This particularly fails this check, which I amended with the g_printf():

  res = g_content_type_guess ("test.pot", (guchar *)"ABC abc", -1, &uncertain);
  expected = g_content_type_from_mime_type ("application/vnd.ms-powerpoint");
  g_printf("expected: %s res: %s uncertain: %i\n", expected, res, uncertain);
  g_assert (g_content_type_equals (expected, res));

Running the test manually I get

martin@panda:~/glib-2.33.3/gio/tests$ ./contenttype
/contenttype/guess: expected: application/vnd.ms-powerpoint res: text/x-gettext-translation-template uncertain: 0
**
ERROR:contenttype.c:61:test_guess: assertion failed: (g_content_type_equals (expected, res))

This does not happen on my x86_64 machine. I'm not sure whether it is really architecture specific or just by accident. In either case, "ABC abc" is not a valid line in a PO template, but PowerPoint templates look very binary to me as well, so I guess by that content data neither guess is particularly correct.
Comment 1 Martin Pitt 2012-06-27 08:13:17 UTC
Ah, it actually happens on x86_64 as well now. The fix for bug 674452 stops reading beyond the size of "data"; that usually is some random binary junk (whatever is in memory), and thus suspiciously looks like the binary stuff PowerPoint templates have :)

Stepping through with a debugger:

925	          n_name_mimetypes = xdg_mime_get_mime_types_from_file_name (basename, name_mimetypes, 10);
(gdb) p n_name_mimetypes 
$1 = 2
(gdb) p name_mimetypes 
$2 = {0x7ffff7fb5cc4 "application/vnd.ms-powerpoint", 0x7ffff7fb5a34 "text/x-gettext-translation-template", 
  0x0, 
[...]

It correctly identifies both PowerPoint and PO template as possible types.

939	      sniffed_mimetype = xdg_mime_get_mime_type_for_data (data, data_size, &sniffed_prio);
(gdb) p sniffed_mimetype 
$1 = 0x7ffff7601d30 "application/octet-stream"
(gdb) p sniffed_prio 
$4 = 0

So it indeed cannot make much out of "ABC abc".

(gdb) call looks_like_text (data, data_size)
$3 = 1
[...]
943	        sniffed_mimetype = "text/plain";

This also seems correct.

978	                  if ( xdg_mime_mime_type_subclass (name_mimetypes[i], sniffed_mimetype))

This fails for name_mimetypes[0] (powerpoint), but succeeds for "text/x-gettext-translation-template". I guess this is okay, as .pot is a plain text format.

The test already has

  g_assert (uncertain);

so I propose to check that either powerpoint or PO template as a result is okay, and to add two more tests which uniquely identify either.
Comment 2 Martin Pitt 2012-06-27 08:20:10 UTC
Created attachment 217348 [details] [review]
improve contenttype tests

I propose to adjust the tests like in the attached patch.

The first one now uses

  res = g_content_type_guess ("test.pot", (guchar *)"msgid \"", 7, &uncertain);

which ought to be unanimously a PO template. This works fine with current glib.

The third uses the old test

  res = g_content_type_guess ("test.pot", (guchar *)"ABC abc", 7, &uncertain);

which should be uncertain (as current master already asserts), so I modified it to accept either result. Current glib does not recognize it as uncertain, though, the test fails with

ERROR:contenttype.c:79:test_guess: assertion failed: (uncertain)


The second test gives a binary magic (which I've got from some example PowerPoint template that I googled):

  res = g_content_type_guess ("test.pot", (guchar *)"\xCF\xD0\xE0\x11", 4, &uncertain);

I think this also ought to be unanimous, as this is not valid ASCII or UTF-8. res is indeed PO template, but this time uncertain is unexpectedly True:

ERROR:contenttype.c:69:test_guess: assertion failed: (!uncertain)
Comment 3 Dan Winship 2012-06-27 13:15:34 UTC
(In reply to comment #2)
> The third uses the old test
> 
>   res = g_content_type_guess ("test.pot", (guchar *)"ABC abc", 7, &uncertain);
> 
> which should be uncertain (as current master already asserts), so I modified it
> to accept either result. Current glib does not recognize it as uncertain,
> though, the test fails with
> 
> ERROR:contenttype.c:79:test_guess: assertion failed: (uncertain)

It's not uncertain because it's plain-text-y, and so it can't be powerpoint. It was only apparently uncertain before because it was looking at garbage past the end of the string.

> The second test gives a binary magic (which I've got from some example
> PowerPoint template that I googled):
> 
>   res = g_content_type_guess ("test.pot", (guchar *)"\xCF\xD0\xE0\x11", 4,
> &uncertain);
> 
> I think this also ought to be unanimous, as this is not valid ASCII or UTF-8.
> res is indeed PO template, but this time uncertain is unexpectedly True:

I guess you'd have to trace through the code to see why...
Comment 4 Martin Pitt 2012-06-27 13:47:48 UTC
Created attachment 217400 [details] [review]
improve contenttype tests

> It's not uncertain because it's plain-text-y, and so it can't be powerpoint.

Fair enough. I adjusted the test cases accordingly. Now the only thing that fails is the g_assert (!uncertain); for the "\xCF... binary case, which looks wrong. Stepping through now..

So this patch replaces my previous one for improving the test cases.
Comment 5 Martin Pitt 2012-06-27 14:09:55 UTC
danw | pitti: looks good. thanks

Pushed, thanks for the review!
Comment 6 Martin Pitt 2012-06-27 14:10:22 UTC
Comment on attachment 217400 [details] [review]
improve contenttype tests

Sorry, wrong bug.
Comment 7 Martin Pitt 2012-06-27 14:35:34 UTC
Created attachment 217411 [details] [review]
Fix /contenttype/guess test

As per the previous comments, the actual logic is working as well as it can. This patch updates the test cases to what is really expected, so that the whole test suite now succeeds again.
Comment 8 Martin Pitt 2012-06-27 14:38:24 UTC
For the record: The four bytes from my previous patch (for the "binary junk" case) delivered:

947	      sniffed_mimetype = xdg_mime_get_mime_type_for_data (data, data_size, &sniffed_prio);
(gdb) p sniffed_mimetype 
$2 = 0x7ffff7601db0 "application/octet-stream"

/usr/share/mime/magic does not actually have a magic for powerpoint, so that can't help us, and octet-stream is the best it can do. So I propose we just remove the "uncertain" assertion for this case; right now it is rightfully uncertain (but we do check that the returned value is powerpoint), and if some day we grow a magic for powerpoint it should be "certain" again.
Comment 9 Dan Winship 2012-06-28 12:43:42 UTC
Comment on attachment 217411 [details] [review]
Fix /contenttype/guess test

looks good
Comment 10 Martin Pitt 2012-06-28 13:58:03 UTC
Thanks for the review! Pushed.