After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 597545 - The content sniffing does not work properly
The content sniffing does not work properly
Status: RESOLVED FIXED
Product: libsoup
Classification: Core
Component: Misc
2.28.x
Other Linux
: Normal normal
: ---
Assigned To: libsoup-maint@gnome.bugs
libsoup-maint@gnome.bugs
Depends on:
Blocks:
 
 
Reported: 2009-10-06 12:37 UTC by Alejandro G. Castro
Modified: 2009-10-07 19:37 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Proposed patch to fix the problem (632 bytes, patch)
2009-10-06 12:37 UTC, Alejandro G. Castro
none Details | Review
New proposal of the patch (1.71 KB, patch)
2009-10-06 12:59 UTC, Alejandro G. Castro
needs-work Details | Review
Patch with the line in Makefile.am (2.05 KB, patch)
2009-10-07 16:33 UTC, Alejandro G. Castro
accepted-commit_now Details | Review

Description Alejandro G. Castro 2009-10-06 12:37:59 UTC
Created attachment 144880 [details] [review]
Proposed patch to fix the problem

We are using the content sniffing in webkit and it does not work properly for
the unknown cases. We have a unit test that detected the problem: testwebframe.
We have checked the problem is the condition traversing the buffer to find out
the type. The patch that I'm attaching fixes the problem in our case.
Comment 1 Alejandro G. Castro 2009-10-06 12:59:12 UTC
Created attachment 144884 [details] [review]
New proposal of the patch

I've added a test for this case.
Comment 2 Gustavo Noronha (kov) 2009-10-07 08:30:48 UTC
Looks fine to me, I'd say go ahead and commit.
Comment 3 Dan Winship 2009-10-07 14:33:42 UTC
Comment on attachment 144884 [details] [review]
New proposal of the patch

OK, I haven't bothered to figure out if the change is correct, but if kov says its good, then good.

You get bonus points for providing a new test case, but unfortunately the test case passes even without the patch to soup-content-sniffer.c, so it's not actually testing that the fix works. :) So can you fix that?

Also, need to add resources/test_noinfo.html to SNIFFING_FILES in tests/Makefile.am
Comment 4 Alejandro G. Castro 2009-10-07 16:18:25 UTC
(In reply to comment #3)
> (From update of attachment 144884 [details] [review])
> OK, I haven't bothered to figure out if the change is correct, but if kov says
> its good, then good.
> 
> You get bonus points for providing a new test case, but unfortunately the test
> case passes even without the patch to soup-content-sniffer.c, so it's not
> actually testing that the fix works. :) So can you fix that?
> 

It does not for me :-):

test_sniffing("/unknown/test_noinfo.html", "text/html")
  sniffing failed! expected text/html, got text/plain

Basically just with <html> tag and not meta information it should not work with the current release, not sure why you get a different result. I'll check it again.

> Also, need to add resources/test_noinfo.html to SNIFFING_FILES in
> tests/Makefile.am

Ok, I'm going to add this and upload a new patch.

Thanks for the feedback from both of you :).
Comment 5 Alejandro G. Castro 2009-10-07 16:33:59 UTC
Created attachment 144972 [details] [review]
Patch with the line in Makefile.am

I did the test again and check the code, the test does not work here if I do not add the patch, I'm not sure why it works for you, could you check the values of index_pattern in that case and paste them here?
Comment 6 Dan Winship 2009-10-07 17:38:18 UTC
Comment on attachment 144972 [details] [review]
Patch with the line in Makefile.am

OK, so the failure isn't happening for me because the mask fields in the types table are ending up with lots of 0 bytes after them for me (or at least the " <HTML" row is), so while sniff_unknown() goes too far, it doesn't matter, because it's checking (resource[index_stream] & 0 == 0)...

However, even if I hack things up so that the table doesn't match, sniff_unknown() *still* returns "text/html", because sniff_gio() comes up with that based on the URI. Not sure why it wouldn't do that for you. (xdgmime problem?)

However, the patch is correct. We may need to work on the test program a little more later though...
Comment 7 Alejandro G. Castro 2009-10-07 18:48:13 UTC
(In reply to comment #6)
> (From update of attachment 144972 [details] [review])
> OK, so the failure isn't happening for me because the mask fields in the types
> table are ending up with lots of 0 bytes after them for me (or at least the "
> <HTML" row is), so while sniff_unknown() goes too far, it doesn't matter,
> because it's checking (resource[index_stream] & 0 == 0)...
> 

Oh, for me it is something like this:

(gdb) p type_row->pattern[5]
$12 = 76 'L'
(gdb) p type_row->pattern[6]
$11 = 0 '\0'
(gdb) p type_row->pattern[8]
$5 = 60 '<'
(gdb) p type_row->pattern[9]
$6 = 72 'H'
(gdb) p type_row->pattern[10]
$7 = 69 'E'

And it stops here:
4: type_row->pattern[index_pattern] = 69 'E'
3: resource[index_stream] = 101 'e'
(gdb) 
294						if ((type_row->mask[index_pattern] & resource[index_stream]) != type_row->pattern[index_pattern]) {
4: type_row->pattern[index_pattern] = 69 'E'
3: resource[index_stream] = 101 'e'
(gdb) n
295							skip_row = TRUE;

> However, even if I hack things up so that the table doesn't match,
> sniff_unknown() *still* returns "text/html", because sniff_gio() comes up with
> that based on the URI. Not sure why it wouldn't do that for you. (xdgmime
> problem?)
> 

I found the problem in my environment, I was using jhbuild and I did not build the shared-mime-info, so gio did not have any mime types information. That is the reason it could not detect the correct content with the sniff_gio. So the only way to test this is working well is disabling gio mime types :-), I'm not sure if it makes sense to add the test for this case.
Comment 8 Alejandro G. Castro 2009-10-07 19:37:22 UTC
After talking to Dan in the IRC I've committed the patch without the example.

Thanks for the help.