After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 733317 - tracker-extract: remove application/vnd.ms-* catchall from msoffice
tracker-extract: remove application/vnd.ms-* catchall from msoffice
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: General
unspecified
Other All
: Normal normal
: ---
Assigned To: tracker-general
Depends on:
Blocks:
 
 
Reported: 2014-07-17 13:15 UTC by Giovanni Campagna
Modified: 2014-10-16 13:10 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
tracker-extract: remove application/vnd.ms-* catchall from msoffice (1.01 KB, patch)
2014-07-17 13:15 UTC, Giovanni Campagna
needs-work Details | Review
tracker-extract: recognize application/vnd.ms-asf for the gstreamer plugin (1.96 KB, patch)
2014-07-17 13:27 UTC, Giovanni Campagna
committed Details | Review
tracker-extract: remove application/vnd.ms-* catchall from msoffice (1.19 KB, patch)
2014-08-01 09:31 UTC, Giovanni Campagna
committed Details | Review
tracker-extract: add application/msword back to msoffice rules (1.60 KB, patch)
2014-10-16 08:23 UTC, Martin Kampas
accepted-commit_now Details | Review

Description Giovanni Campagna 2014-07-17 13:15:33 UTC
Otherwise we match on application/vnd.ms-asf (the .asf video container
format), which is not an OLE2 file and msoffice cannot handle.
Comment 1 Giovanni Campagna 2014-07-17 13:15:35 UTC
Created attachment 280979 [details] [review]
tracker-extract: remove application/vnd.ms-* catchall from msoffice
Comment 2 Giovanni Campagna 2014-07-17 13:27:03 UTC
Created attachment 280995 [details] [review]
tracker-extract: recognize application/vnd.ms-asf for the gstreamer plugin

application/vnd.ms-asf is the new standard name for video/x-ms-asf.
Comment 3 Martyn Russell 2014-07-22 10:04:57 UTC
Comment on attachment 280979 [details] [review]
tracker-extract: remove application/vnd.ms-* catchall from msoffice

I'm not really sure if this is the right approach, these are the ones we would miss if we applied this patch:

$ grep -i application/vnd.ms- /usr/share/mime/types|uniq |sort|grep -v application/msword|grep -v application/vnd.ms-powerpoint | grep -v 

application/vnd.ms-excel
application/vnd.ms-access
application/vnd.ms-cab-compressed
application/vnd.ms-htmlhelp
application/vnd.ms-publisher
application/vnd.ms-tnef
application/vnd.ms-word
application/vnd.ms-word.document.macroenabled.12
application/vnd.ms-word.document.macroEnabled.12
application/vnd.ms-word.template.macroenabled.12
application/vnd.ms-word.template.macroEnabled.12
application/vnd.ms-works
application/vnd.ms-wpl

I can see a few in there that we definitely should not be missing and actually, it looks like we have got the msword type completely wrong too in our existing rule file.

If we remove the ms-* we should add in all the ones above that we would miss.
Comment 4 Martyn Russell 2014-07-22 10:05:53 UTC
Comment on attachment 280995 [details] [review]
tracker-extract: recognize application/vnd.ms-asf for the gstreamer plugin

Thanks for this patch!
Comment 5 Giovanni Campagna 2014-08-01 09:31:10 UTC
Created attachment 282242 [details] [review]
tracker-extract: remove application/vnd.ms-* catchall from msoffice

Otherwise we match on application/vnd.ms-asf (the .asf video container
format), which is not an OLE2 file and msoffice cannot handle.


I did not all of them, but only those that are in OLE2 format that
libgsf can recognize.
Comment 6 Martyn Russell 2014-08-07 10:40:12 UTC
Comment on attachment 282242 [details] [review]
tracker-extract: remove application/vnd.ms-* catchall from msoffice

Thanks for the patch revision. Comments:

1. I noticed that vnd.ms-word is in there twice, please remove one of those.

2. I wonder if vnd.ms-htmlhelp makes sense to include?

Please update the patch for #1 and use your discretion on #2 and then commit. Thanks!
Comment 7 Martyn Russell 2014-08-20 15:31:50 UTC
Giovanni, do you need help committing this patch?
Comment 8 Giovanni Campagna 2014-08-21 11:45:23 UTC
Sorry, I was on vacation last week and then I overlooked this
in my TODO list.

Attachment 280995 [details] pushed as e0a8085 - tracker-extract: recognize application/vnd.ms-asf for the gstreamer plugin
Attachment 282242 [details] pushed as cf04b2d - tracker-extract: remove application/vnd.ms-* catchall from msoffice
Comment 9 Martin Kampas 2014-10-16 08:23:36 UTC
Created attachment 288652 [details] [review]
tracker-extract: add application/msword back to msoffice rules

With the application/msword type excluded (some?) MS Word documents are no more indexed - it is the mime type which libmagic/GIO returns.

Identified by functional-tests/400-extractor.py, case office-doc, on Nemomobile.
Comment 10 Martyn Russell 2014-10-16 13:10:42 UTC
Comment on attachment 288652 [details] [review]
tracker-extract: add application/msword back to msoffice rules

Nice catch, thanks for the patch. It would be nice if we could integrate the functional tests into distcheck...