GNOME Bugzilla – Bug 615857
add xml extraction
Last modified: 2021-05-26 22:24:39 UTC
Please add extraction for xml files.
I'm desperate for this. It will be me one step closer to me throwing away the venerable gnome-search-tool.
We accept patches ;) The HTML extractor pretty much has all the code and boiler plate you would need in place. The problem is, *how* do you extract XML data? I mean the elements can be unique, so how do you deal with those?
(In reply to comment #2) > We accept patches ;) > > The HTML extractor pretty much has all the code and boiler plate you would need > in place. The problem is, *how* do you extract XML data? I mean the elements > can be unique, so how do you deal with those? That is something we should really take care of. I would imagine lots of situations where you would like a specific extractor for some specific XML file. This could be done enabling more than one specific extractor for a given mime-type. Something like: * tracker-extract-xml-type1.c * tracker-extract-xml-type2.c * tracker-extract-xml-type3.c * tracker-extract-xml-default.c All extractors would be for the same exact mime-type (application/xml). If an XML file is then requested to get extracted, we would do: * Try with type1 extractor * If type1 extractor doesn't like the XML, try with type2 extractor * If type2 extractor doesn't like the XML, try with type3 extractor * If type 3 extractor doesn't like the XML, try with default extractor The order to try non-default specific extractors wouldn't matter, as long as each extractor notifies when it can't process the given XML (maybe looking for some specific XML tags that are mandatory in the specific XML schema supported by each extractor). The default last XML extractor would just do a best try to extract the contents (text inside the tags) into nie:plainTextContent. Actually this could also be applied to the text extractor, where we could enable additional specific extractors to be executed before the default one; if and only if the extractors notify when they cannot process the file because it's not what they expect.
GNOME is going to shut down bugzilla.gnome.org in favor of gitlab.gnome.org. As part of that, we are mass-closing older open tickets in bugzilla.gnome.org which have not seen updates for a longer time (resources are unfortunately quite limited so not every ticket can get handled). If you can still reproduce the situation described in this ticket in a recent and supported software version, then please follow https://wiki.gnome.org/GettingInTouch/BugReportingGuidelines and create a new enhancement request ticket at https://gitlab.gnome.org/GNOME/tracker/-/issues/ Thank you for your understanding and your help.