GNOME Bugzilla – Bug 303415
Text search for djvu
Last modified: 2006-02-19 20:30:21 UTC
DjVu documents can have textual data stored with each page along with the page image. Evince should be able to search the text of a djvu document, as with PDFs.
Right, but the current djvulibre public API doesn't allow such advanced operations. So probably this won't be done in near future.
But don't the Qt-based djview and the command-line apps djvused and djvutxt do that? I thought they also utilize the same djvulibre API.
They use private parts of api, while public installed part is much less usable. Probably, we should also try to have copy-pasted headers or ask djvulibre developers for more functions.
Created attachment 59715 [details] [review] Adds search and copy support for djvu This patch adds search and text copy support for djvu files. It requires a current libdjvu (configure.ac). Because I didn't really understand the interface for text selection, there is no support yet for showing the marked text, but it still works. Just select an imaginary rectangle and all text that is in the given range is selected and copied (works like the pdf selection). Any hints on how to implement this interface would be appreciated :-). Outline support could also be implemented with the new library version, but I don't have any djvu-files with that. Example for testing: http://craphound.com/down/Cory_Doctorow_-_Down_and_Out_in_the_Magic_Kingdom.djvu
Wonderful, thanks a lot Michael I've committed a patch, although few minor issues are left, but I think we will fix them later. Probably separate bugs about them should be created: 1. I prefer more gobject-oriented code (probably it's possible to make DjvuPageText a successor of GObject) and so on. 2. There is minor offset in highlighting. I don't know if it's OCR bug or our bug, but we have similar problem in PDF, so probably it's rendering bug. 3. There is problem in miniexp header described in http://sourceforge.net/tracker/index.php?func=detail&aid=1434756&group_id=32953&atid=406583 i wanna see it fixed but it's Leon's task.
And, about selection, yeah, current interface is a bit poppler-oriented, it will be hard to implement text rendering with djvu. Probably we should rethink this interface, let's also discuss it in another bug.