GNOME Bugzilla – Bug 703108
Implement the get_text interface for djvu backend
Last modified: 2013-06-29 09:34:49 UTC
The djvu backen can implement text_get_text and text_get_text_mapping methods of the get_text interface. The get_text_layout interface depends on being able to have bounding boxes around characters in words. The djvu backend is unable to do that at the moment. I have a patch to implement get_text_mapping and get_text and will post it shortly. The patch depends on: https://bugzilla.gnome.org/show_bug.cgi?id=448739
Created attachment 247805 [details] [review] Implementation of the get_text interface for djvu
Is there any guarantee that the returned text is valid UTF-8 ?
I just reread the djvulibre api and implementation, and it looks to me that the text is guaranteed to be in utf-8.
Created attachment 247943 [details] [review] V2. The function djvu_text_prepare_search is somewhat badly named if it is to be used in the get_text function as well. The v2 patch renames it to djvu_text_page_index_text.
Review of attachment 247943 [details] [review]: Split in two patches and pushed to git master, thanks! ::: backend/djvu/djvu-document.c @@ +692,3 @@ + djvu_text_page_index_text (tpage, TRUE); + text = g_strdup (tpage->text); + djvu_text_page_free (tpage); Since we are going to free the page here, we can steal the text instead of duplicating it.