GNOME Bugzilla – Bug 597545
The content sniffing does not work properly
Last modified: 2009-10-07 19:37:22 UTC
Created attachment 144880 [details] [review] Proposed patch to fix the problem We are using the content sniffing in webkit and it does not work properly for the unknown cases. We have a unit test that detected the problem: testwebframe. We have checked the problem is the condition traversing the buffer to find out the type. The patch that I'm attaching fixes the problem in our case.
Created attachment 144884 [details] [review] New proposal of the patch I've added a test for this case.
Looks fine to me, I'd say go ahead and commit.
Comment on attachment 144884 [details] [review] New proposal of the patch OK, I haven't bothered to figure out if the change is correct, but if kov says its good, then good. You get bonus points for providing a new test case, but unfortunately the test case passes even without the patch to soup-content-sniffer.c, so it's not actually testing that the fix works. :) So can you fix that? Also, need to add resources/test_noinfo.html to SNIFFING_FILES in tests/Makefile.am
(In reply to comment #3) > (From update of attachment 144884 [details] [review]) > OK, I haven't bothered to figure out if the change is correct, but if kov says > its good, then good. > > You get bonus points for providing a new test case, but unfortunately the test > case passes even without the patch to soup-content-sniffer.c, so it's not > actually testing that the fix works. :) So can you fix that? > It does not for me :-): test_sniffing("/unknown/test_noinfo.html", "text/html") sniffing failed! expected text/html, got text/plain Basically just with <html> tag and not meta information it should not work with the current release, not sure why you get a different result. I'll check it again. > Also, need to add resources/test_noinfo.html to SNIFFING_FILES in > tests/Makefile.am Ok, I'm going to add this and upload a new patch. Thanks for the feedback from both of you :).
Created attachment 144972 [details] [review] Patch with the line in Makefile.am I did the test again and check the code, the test does not work here if I do not add the patch, I'm not sure why it works for you, could you check the values of index_pattern in that case and paste them here?
Comment on attachment 144972 [details] [review] Patch with the line in Makefile.am OK, so the failure isn't happening for me because the mask fields in the types table are ending up with lots of 0 bytes after them for me (or at least the " <HTML" row is), so while sniff_unknown() goes too far, it doesn't matter, because it's checking (resource[index_stream] & 0 == 0)... However, even if I hack things up so that the table doesn't match, sniff_unknown() *still* returns "text/html", because sniff_gio() comes up with that based on the URI. Not sure why it wouldn't do that for you. (xdgmime problem?) However, the patch is correct. We may need to work on the test program a little more later though...
(In reply to comment #6) > (From update of attachment 144972 [details] [review]) > OK, so the failure isn't happening for me because the mask fields in the types > table are ending up with lots of 0 bytes after them for me (or at least the " > <HTML" row is), so while sniff_unknown() goes too far, it doesn't matter, > because it's checking (resource[index_stream] & 0 == 0)... > Oh, for me it is something like this: (gdb) p type_row->pattern[5] $12 = 76 'L' (gdb) p type_row->pattern[6] $11 = 0 '\0' (gdb) p type_row->pattern[8] $5 = 60 '<' (gdb) p type_row->pattern[9] $6 = 72 'H' (gdb) p type_row->pattern[10] $7 = 69 'E' And it stops here: 4: type_row->pattern[index_pattern] = 69 'E' 3: resource[index_stream] = 101 'e' (gdb) 294 if ((type_row->mask[index_pattern] & resource[index_stream]) != type_row->pattern[index_pattern]) { 4: type_row->pattern[index_pattern] = 69 'E' 3: resource[index_stream] = 101 'e' (gdb) n 295 skip_row = TRUE; > However, even if I hack things up so that the table doesn't match, > sniff_unknown() *still* returns "text/html", because sniff_gio() comes up with > that based on the URI. Not sure why it wouldn't do that for you. (xdgmime > problem?) > I found the problem in my environment, I was using jhbuild and I did not build the shared-mime-info, so gio did not have any mime types information. That is the reason it could not detect the correct content with the sniff_gio. So the only way to test this is working well is disabling gio mime types :-), I'm not sure if it makes sense to add the test for this case.
After talking to Dan in the IRC I've committed the patch without the example. Thanks for the help.