After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 766781 - Add a DjVu view
Add a DjVu view
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Extractor
git master
Other Linux
: Normal enhancement
: ---
Assigned To: tracker-extractor
tracker-extractor
Depends on:
Blocks:
 
 
Reported: 2016-05-22 20:45 UTC by Jeremy Bicha
Modified: 2016-09-08 23:29 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
books-Really-show-DjVu-files.patch (2.24 KB, patch)
2016-05-26 20:06 UTC, Jeremy Bicha
rejected Details | Review
tracker-extract: Consider DjVu multipage docs as Ebooks (2.24 KB, patch)
2016-05-30 17:02 UTC, Bastien Nocera
committed Details | Review

Description Jeremy Bicha 2016-05-22 20:45:53 UTC
This bug is an extension of bug 745327 which enabled Books to recognize .djvu files but not to actually view them.

Quoting the bug reporter:

"Another file format that might be interesting to support. It's mostly used in academia, and supported by Evince:"
https://en.wikipedia.org/wiki/DjVu
Comment 1 Bastien Nocera 2016-05-26 13:07:42 UTC
(In reply to Jeremy Bicha from comment #0)
> This bug is an extension of bug 745327 which enabled Books to recognize
> .djvu files but not to actually view them.

Did you actually test a build with that fix? Because evince already supports that file type, so it should just work.
Comment 2 Jeremy Bicha 2016-05-26 19:45:28 UTC

(In reply to Bastien Nocera from comment #1)
> Did you actually test a build with that fix? Because evince already supports
> that file type, so it should just work.

I am using Evince 3.20.0-3 on Ubuntu 16.10 (development).

Yes but it didn't work.

I see now that Books would open DjVu files...except that bug 745327 didn't go far enough.

The Internet Archive has many DjVu files of public domain books. Here's one:

https://archive.org/download/shavingmadeeasyw0020th/shavingmadeeasyw0020th.djvu

(For the info page, see https://archive.org/details/shavingmadeeasyw0020th )


$ mimetype shavingmadeeasyw0020th.djvu 
shavingmadeeasyw0020th.djvu: image/vnd.djvu

$ tracker info shavingmadeeasyw0020th.djvu 
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Media'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Visual'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Image'
  'rdf:type' = 'http://www.tracker-project.org/temp/nmm#Photo'
  'nie:mimeType' = 'image/vnd.djvu'

(only displaying select lines from tracker info, above)


====
Books therefore needs to show image/vnd.djvu with nfo:image

We may also want to show every djvu mimetype that Evince does:
https://git.gnome.org/browse/evince/tree/configure.ac#n766
Comment 3 Jeremy Bicha 2016-05-26 20:06:50 UTC
Created attachment 328598 [details] [review]
books-Really-show-DjVu-files.patch

I don't think "Images" for SearchTypeStock is very descriptive. Do you have a better idea for that?

I wasn't able to find an example of a file my computer recognized as image/vnd.djvu+multipage although there is a
/usr/share/mime/image/vnd.djvu+multipage.xml on my computer.
Comment 4 Bastien Nocera 2016-05-26 23:50:29 UTC
Review of attachment 328598 [details] [review]:

::: src/search.js
@@ +220,3 @@
                                         name: _("e-Books"),
                                         filter: '(nie:mimeType(?urn) IN (\"application/epub+zip\", \"application/x-mobipocket-ebook\", \"application/x-fictionbook+xml\", \"application/x-zip-compressed-fb2\", \"image/vnd.djvu+multipage\"))',
                                         where: '?urn rdf:type nfo:EBook .' }));

Here is where the multi-page DjVu should be tagged. If they're not tagged correctly for you, check that your tracker has DjVu support, and poke at tracker to reindex it if necessary.

@@ +223,3 @@
+          this.addItem(new SearchType({ id: SearchTypeStock.IMAGES,
+                                        name: _("DjVu e-Books"),
+                                        filter: '(nie:mimeType(?urn) IN (\"image/vnd.djvu\", \"image/vnd.djvu+multipage\", \"application/x-ext-djv\", \"application/x-ext-djvu\"))',

We don't want this to be a separate type (it's an ebook) and we don't want to handle the single page ones, they're images.

image/vnd.djvu+multipage is the only one we'd add support for, not image/vnd.djvu.

> \"application/x-ext-djv\", \"application/x-ext-djvu\"))',

Those are probably aliases, which should be in shared-mime-info, not here, we only want canonical mime-types.
Comment 5 Jeremy Bicha 2016-05-27 04:52:06 UTC
(In reply to Bastien Nocera from comment #4)
> Here is where the multi-page DjVu should be tagged. If they're not tagged
> correctly for you, check that your tracker has DjVu support, and poke at
> tracker to reindex it if necessary.

This feels like a bit of a trick question. Your tracker DjVu patch was only committed 2 weeks ago and isn't in a stable release yet.

https://git.gnome.org/browse/tracker/commit/?id=ddb79e379

But after applying that patch to tracker 1.8.0 and making sure
/usr/share/tracker/extract-rules/10-djvu.rule exists, I still am unable to find a DjVu file that tracker or mimetype recognizes as anything other than image/vnd.djvu. And I do have shared-mime-info 1.6 installed (see comment 3).

Could you supply a DjVu file that you have tested that works?

> We don't want this to be a separate type (it's an ebook) and we don't want
> to handle the single page ones, they're images.

How common is that really? (Can you supply an example file for that too?) Because...for instance, there are plenty of 1-page PDFs that aren't just an image. And if so, then shouldn't single-page DjVu files show up in GNOME Documents instead?
Comment 6 Bastien Nocera 2016-05-27 11:10:44 UTC
(In reply to Jeremy Bicha from comment #5)
> (In reply to Bastien Nocera from comment #4)
> > Here is where the multi-page DjVu should be tagged. If they're not tagged
> > correctly for you, check that your tracker has DjVu support, and poke at
> > tracker to reindex it if necessary.
> 
> This feels like a bit of a trick question. Your tracker DjVu patch was only
> committed 2 weeks ago and isn't in a stable release yet.
> 
> https://git.gnome.org/browse/tracker/commit/?id=ddb79e379

I can't really legislate for reviews taking too long.

> But after applying that patch to tracker 1.8.0 and making sure
> /usr/share/tracker/extract-rules/10-djvu.rule exists, I still am unable to
> find a DjVu file that tracker or mimetype recognizes as anything other than
> image/vnd.djvu. And I do have shared-mime-info 1.6 installed (see comment 3).
> 
> Could you supply a DjVu file that you have tested that works?

This is a multi-page file:
http://djvu.org/docs/2001_compression_overview.djvu

It's in shared-mime-info as a multi-page example.

> > We don't want this to be a separate type (it's an ebook) and we don't want
> > to handle the single page ones, they're images.
> 
> How common is that really? (Can you supply an example file for that too?)
> Because...for instance, there are plenty of 1-page PDFs that aren't just an
> image. And if so, then shouldn't single-page DjVu files show up in GNOME
> Documents instead?

DjVu is an image format. DjVu multi-page is a book scanning format.

Just like JPEG is an image format, but CBZ is book format. Or TIFF vs. TIFF multi-page.
Comment 7 Jeremy Bicha 2016-05-27 15:03:21 UTC
(In reply to Bastien Nocera from comment #6)
> This is a multi-page file:
> http://djvu.org/docs/2001_compression_overview.djvu

Yes, I found that file.

I am really trying hard to figure out why this isn't working the way you describe. I tried in Fedora 24 and manually installed /usr/share/tracker/extract-rules/10-djvu.rule and /usr/share/tracker/extract-rules/10-ebooks.rule

tracker info still shows the file as image/vnd.djvu with similar output to what I posted in comment #2.

I feel like you're suggesting my bug is invalid because WORKSFORME, but I can't find a distro where this code actually does work.

> Just like JPEG is an image format, but CBZ is book format. Or TIFF vs. TIFF
> multi-page.

And there are single-page TIFF documents produced by scanners for instance. I think it's a problem if GNOME Documents would (arbitrarily) display 2-page TIFFs but not 1-page TIFFs. (I haven't tested yet to see if this is happening.)
Comment 8 Bastien Nocera 2016-05-27 15:44:24 UTC
(In reply to Jeremy Bicha from comment #7)
> (In reply to Bastien Nocera from comment #6)
> > This is a multi-page file:
> > http://djvu.org/docs/2001_compression_overview.djvu
> 
> Yes, I found that file.
> 
> I am really trying hard to figure out why this isn't working the way you
> describe. I tried in Fedora 24 and manually installed
> /usr/share/tracker/extract-rules/10-djvu.rule and
> /usr/share/tracker/extract-rules/10-ebooks.rule
> 
> tracker info still shows the file as image/vnd.djvu with similar output to
> what I posted in comment #2.
> 
> I feel like you're suggesting my bug is invalid because WORKSFORME, but I
> can't find a distro where this code actually does work.

No, I'm saying that your premise is wrong if you're trying to get single page DjVu images to show up in gnome-books.

First, check whether the file get recognised as multi-page DjVu files through magic/mime. Then check whether tracker is tagging it properly. Then, and only then, would it be a problem in gnome-books if they didn't show up in gnome-books.

> > Just like JPEG is an image format, but CBZ is book format. Or TIFF vs. TIFF
> > multi-page.
> 
> And there are single-page TIFF documents produced by scanners for instance.
> I think it's a problem if GNOME Documents would (arbitrarily) display 2-page
> TIFFs but not 1-page TIFFs. (I haven't tested yet to see if this is
> happening.)

We're supposed to show multi-page TIFFs as documents, though I don't remember whether that's implemented (there's no magic for it, so we expect tracker to tag them for us, see image/x-tiff-multipage in shared-mime-info).
Comment 9 Jeremy Bicha 2016-05-28 01:27:54 UTC
We're making progress. For those following along, this is what we need so far:

https://cgit.freedesktop.org/xdg/shared-mime-info/commit/?id=e0f51b2ccda6f403f2b3abc4045d930feda765fc

https://git.gnome.org/browse/tracker/commit/?id=ddb79e379c1a8a031b6d7de693a69f0eab7a99f7

https://git.gnome.org/browse/tracker/commit/?id=cf7f5df75e5f705cd2662a17342bc1439f3449c3

What's expected to happen (as I understand it)
---------------
1. GNOME Books shows .djvu multi-page files in tracker-indexed directories.
2. GNOME Documents does not show .djvu multi-page files (whether it shows single-page files is a different issue).

What currently happens
----------------
0. (I believe I had to reset my tracker index for it to change the metadata. If so, then that's another issue.)
1. GNOME Books still does not show .djvu files.
2. GNOME Documents now shows multi-page .djvu files.

What's happening
----------------
$ tracker info 2001_compression_overview.djvu 
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/01/19/nie#DataObject'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/01/19/nie#InformationElement'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#Document'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#TextDocument'
  'rdf:type' = 'http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#PaginatedTextDocument'
  'nie:mimeType' = 'image/vnd.djvu+multipage'

This is courtesy of the first tracker commit posted above.

Let's look at https://git.gnome.org/browse/gnome-documents/tree/src/search.js#n217

Since the .djvu is a PaginatedTextDocument and not an EBook, it falls through line 218 but is picked up line 233.

I'm guessing you want to correct this in Tracker?
Comment 10 Bastien Nocera 2016-05-30 17:02:23 UTC
Created attachment 328757 [details] [review]
tracker-extract: Consider DjVu multipage docs as Ebooks

ddb79e3 and cf7f5df tried to categorise DjVu files as paginated
documents, which would have slotted them next to PDF files in
gnome-documents.

But we'd like them to be near EPubs in gnome-books. As the EBook RDF
type is not too strict in its definition, tagging the DjVu multi-page
files as Ebooks is the easy way to fix it.
Comment 11 Bastien Nocera 2016-05-30 17:04:08 UTC
Attachment 328757 [details] pushed as 08f86bb - tracker-extract: Consider DjVu multipage docs as Ebooks
Comment 12 Jeremy Bicha 2016-09-08 23:29:44 UTC
I confirm that .djvu files are recognized and open properly in Ubuntu GNOME 16.10 Beta with gnome-books 3.21.91, tracker 1.9 and shared-mime-info 0.7.

Unfortunately, it is necessary to run this command for existing files to be recognized in the directory containing them (it should automatically handle subdirectories too):

tracker reset -f .