GNOME Bugzilla – Bug 746401
epub: handle multiple dc:identifier tags
Last modified: 2015-04-09 14:51:38 UTC
I received an epub here that in its .opf metadata contains: ... <dc:identifier id="uuid_id" opf:scheme="uuid">urn:uuid:</dc:identifier> <dc:language>de</dc:language> <dc:identifier opf:scheme="UUID">urn:uuid:dff6037b-450e-4912-9433-5e6ca7937669</dc:identifier> ... Having multiple UUID identifiers look like brokenness of the file (and it's amusingly the first broken uuid urn which is referenced in the TOC file), and directly translates to several nie:identifiers being added to the sparql query, which causes sparql warnings as we break its cardinality. Looking at the code, AFAICS it may also be the case if a file has both UUID/ISBN identifiers, as both are translated to nie:identifier. I'm attaching a patch that just adds the first identifier found as nie:identifier and ignores the rest, this makes the insertion successful in these cases. The patch should be considered after the hard code freeze is lifted.
Created attachment 299707 [details] [review] tracker-extract-epub: Ensure we only have one nie:identifier This property has maxCardinality=1, we are however possibly adding multiple values there, either in both UUID/ISBN forms, or as multiple UUIDs in faulty epubs. ISBN should probably be its own rdf:Property, in the mean time, stick to the first nie:identifier found, and ignore the rest.
Review of attachment 299707 [details] [review]: Looks right to me. I presume the first ID is the best to use of all of them?
(In reply to Martyn Russell from comment #2) > Review of attachment 299707 [details] [review] [review]: > > Looks right to me. I presume the first ID is the best to use of all of them? We just don't know, files could be broken in whatever way. Although this property is just nrl:maxCardinality 1, it is not nrl:InverseFunctionalProperty so we don't have to care about its uniqueness. The one piece of info I consider interesting is ISBN, but we don't have ontology for this. There's been a bug open to nepomuk about it for quite long [1], but still open... Nepomuk doesn't seem to see much activity at all nowadays :( Anyway, I'm pushing to master/1.2 [1] http://dev.nepomuk.semanticdesktop.org/ticket/551
Attachment 299707 [details] pushed as 5c70907 - tracker-extract-epub: Ensure we only have one nie:identifier