GNOME Bugzilla – Bug 439655
Keywords are not tags
Last modified: 2010-05-17 13:32:33 UTC
Tracker automatically adds tags to office documents and PDFs for all keywords stored in these files. This behaviour is annoying. Let me make my point with an example. Document keywords will often contain names. Now although I wouldn't mind too badly if a document was tagged 'Kuhn', it's really silly if it's also tagged with this philosopher's very common first name, 'Thomas'. Even worse is when names have a Von-part (to use BibTeX terminology). This clutters the tag list with words like 'Van' (translates to: 'of') and 'De' ('the'). Populating the tag database with document keywords is annoying for a second reason also. Some keyword-enabled documents that I have come from the Internet. This means that some of the tags in my tracker DB fall outside my control. If tags are intended to replace directories for organising files, this clearly is undesirable behaviour. Keywords are a great feature. They make searching for documents easier. They are not, however, tags. Other information:
I agree. Keywords (special metadata extracted from files) should be used to set the relevance of items when showing search results and should not (directly) appear to users. For example, lets me assume you are searching for a document with the word "financial" and only results are 2 ODFs; the first one is a spreadsheet containing "financial" in as keyword, the second one only in its contents. The fist document should be more relevant then the other. Tags, instead should be fully user-controllable (edit, apply, remove...) providing a similiar relevance behavior, but they should be orthogonal to keywords. I hope this distinction could land on Xesam.
Luca: These things are taken into account in the upcoming xesam ontology, so no worries. Another thing is if the indexers will repsect the ontology and do sane hit ranking :-)
thats right the kweywords are not stored against userKeywords so this is not a bug
Moving "Indexer" component bugs to "General" since "Indexer" refers to the old 0.6 architecture