Bug 654832 – Add ev_job_load_new_with_data() to load files from data in memory

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 654832 - Add ev_job_load_new_with_data() to load files from data in memory


Summary:	Add ev_job_load_new_with_data() to load files from data in memory


Status:	RESOLVED OBSOLETE

Product:	evince
Classification:	Core
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Evince Maintainers
QA Contact:	Evince Maintainers

URL:
Whiteboard:

Depends on:
Blocks:	644728

Reported:	2011-07-18 10:23 UTC by Murray Cumming
Modified:	2018-05-22 14:19 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
0001-Add-ev_job_load_new_with_data-to-load-files-from-dat.patch (20.64 KB, patch) 2011-07-18 10:23 UTC, Murray Cumming	none	Details \| Review
0001-backends-Add-comments-about-missing-load_data-implem.patch (3.40 KB, patch) 2011-07-19 07:26 UTC, Murray Cumming	none	Details \| Review
Add loading documents from memory, for now only pdf backend. (20.20 KB, patch) 2012-04-26 14:30 UTC, Krzesimir Nowak	rejected	Details \| Review
Rest of backends have no loading from memory for now. (3.43 KB, patch) 2012-04-26 14:32 UTC, Krzesimir Nowak	rejected	Details \| Review
Adds compression and decompression functions for documents in memory. (12.16 KB, patch) 2012-04-27 11:23 UTC, Krzesimir Nowak	reviewed	Details \| Review
Adds some newly added functions to documentation. (1.73 KB, patch) 2012-04-27 11:23 UTC, Krzesimir Nowak	none	Details \| Review
Adds loading DJVU documents from memory. (4.56 KB, patch) 2012-04-27 13:24 UTC, Krzesimir Nowak	needs-work	Details \| Review
Adds loading DJVU documents from memory, v2. (6.07 KB, patch) 2012-06-06 13:42 UTC, Krzesimir Nowak	needs-work	Details \| Review
Adds loading DJVU documents from memory, v3. (5.62 KB, patch) 2012-06-11 10:12 UTC, Krzesimir Nowak	reviewed	Details \| Review

Description Murray Cumming 2011-07-18 10:23:47 UTC

Created attachment 192170 [details] [review]
0001-Add-ev_job_load_new_with_data-to-load-files-from-dat.patch

This works for me, though it's only implemented for the PDF backend so far.

From the commit message:

    Add ev_job_load_new_with_data() to load files from data in memory.
    
    * libview/ev-jobs.h b/libview/ev-jobs.[h|c]: Add data and data_length struct
    fields. Add ev_job_load_new_with_data() to set them.
    ev_job_load_run(): Call the new ev_document_load_from_data() function instead
    of ev_document_load(), if data is set.
    * libdocument/ev-document-factory.[h|c]:
    Add ev_document_factory_get_document_from_data(), using the same strucure as
    the existing ev_document_factory_get_document(uri). Note that this does not
    yet support compressed data.
    * libdocument/ev-document.[h|c]: Add a load_data() vfunc.
    Add ev_document_load_from_data(), calling that vfunc, like the existing
    ev_document_load().
    * backend/pdf/ev-poppler.cc: Add pdf_document_load_data(), implementating
    the EvDocument load_data vfunc.

Comment 1 Murray Cumming 2011-07-19 07:26:05 UTC

Created attachment 192226 [details] [review]
0001-backends-Add-comments-about-missing-load_data-implem.patch

I quickly investigated the other backends, but it looks like the poppler backend is the only one that can easily support loading from data. This patch adds comments saying why the others can't do it, so you might like to apply it just to have the information there.

I'm not sure about the comics backend - I don't see yet where the data is actually loaded.

I guess it would be nice to have a way to discover if evince can load a particular mime type as data, rather than having to catch an error asynchronously from EvJob's signal.

Comment 2 Murray Cumming 2011-11-04 21:04:48 UTC

Could I get some response, please?

Comment 3 Carlos Garcia Campos 2011-11-05 10:18:00 UTC

Review of attachment 192170 [details] [review]:

Sorry for the late reply, and thank your for the patch!

::: backend/pdf/ev-poppler.cc
@@ +293,3 @@
+	// See https://bugs.freedesktop.org/show_bug.cgi?id=39322
+	pdf_document->document =
+		poppler_document_new_from_data (const_cast<char*>(data), length, pdf_document->password, &poppler_error);

Even though this file is C++ I prefer to use C style casts.

::: libdocument/ev-document-factory.c
@@ +153,3 @@
+	*compression = EV_COMPRESSION_NONE;
+
+	content_type = g_content_type_guess (NULL, (const guchar*)data, length, 

Would it be possible to create the load job with guchar and use it everywhere to avoid these casts?

@@ +285,3 @@
+
+static void
+free_uncompressed_data (gchar *data)

We don't need this, free_uncompressed_uri is needed because it also removes the temp file, but in this case we can just use g_free directly as GDestroyNotify function

@@ +395,3 @@
+				return document;
+				
+				/* else fall through to slow mime code detection. */

This comment doesn't make sense when loading from data, I guess.

::: libdocument/ev-document.c
@@ +214,3 @@
 
+static void
+ev_document_load_cache (EvDocument  *document,

I prefer something like ev_document_setup_cache or init_cache

@@ +217,3 @@
+		  const char  *uri,
+		  const char  *data,
+		  gsize        data_length)

data and data_length are not used at all in this function. I think we could remove both uri and data parameters, and set the uri only when loading from uri before calling this and setup the syntex parser only the case of uri too, after calling this method.

@@ +344,3 @@
 		 * going to the backends since it requires locks
 		 */
+	        ev_document_load_cache (document, uri, NULL, 0);

So, here this would be:

priv->uri = g_strdup (uri);
ev_document_init_cache(document);
if (_ev_document_support_synctex (document))
.....

@@ +380,1 @@
+	retval = klass->load_data (document, data, length, &err);

klass->load_data can be NULL for backends not supporting it, so we should check it here and maybe even fill an error saying loading from data is not supported by that document.

Comment 4 Krzesimir Nowak 2012-04-26 14:23:25 UTC

(In reply to comment #3)
> Review of attachment 192170 [details] [review]:
> 
> Sorry for the late reply, and thank your for the patch!
> 

Taking over here. I adapted both patches to be appliable to current master.

> ::: backend/pdf/ev-poppler.cc
> @@ +293,3 @@
> +    // See https://bugs.freedesktop.org/show_bug.cgi?id=39322
> +    pdf_document->document =
> +        poppler_document_new_from_data (const_cast<char*>(data), length,
> pdf_document->password, &poppler_error);
> 
> Even though this file is C++ I prefer to use C style casts.

Done.

> 
> ::: libdocument/ev-document-factory.c
> @@ +153,3 @@
> +    *compression = EV_COMPRESSION_NONE;
> +
> +    content_type = g_content_type_guess (NULL, (const guchar*)data, length, 
> 
> Would it be possible to create the load job with guchar and use it everywhere
> to avoid these casts?

Casts are unavoidable. We might want guchar*, but there are APIs that take char* (poppler_document_new_from_data).

> @@ +285,3 @@
> +
> +static void
> +free_uncompressed_data (gchar *data)
> 
> We don't need this, free_uncompressed_uri is needed because it also removes the
> temp file, but in this case we can just use g_free directly as GDestroyNotify
> function

Removed.

> @@ +395,3 @@
> +                return document;
> +                
> +                /* else fall through to slow mime code detection. */
> 
> This comment doesn't make sense when loading from data, I guess.

Removed.

> ::: libdocument/ev-document.c
> @@ +214,3 @@
> 
> +static void
> +ev_document_load_cache (EvDocument  *document,
> 
> I prefer something like ev_document_setup_cache or init_cache
> 
> @@ +217,3 @@
> +          const char  *uri,
> +          const char  *data,
> +          gsize        data_length)
> 
> data and data_length are not used at all in this function. I think we could
> remove both uri and data parameters, and set the uri only when loading from uri
> before calling this and setup the syntex parser only the case of uri too, after
> calling this method.
> 
> @@ +344,3 @@
>           * going to the backends since it requires locks
>           */
> +            ev_document_load_cache (document, uri, NULL, 0);
> 
> So, here this would be:
> 
> priv->uri = g_strdup (uri);
> ev_document_init_cache(document);
> if (_ev_document_support_synctex (document))
> .....

Done.

> @@ +380,1 @@
> +    retval = klass->load_data (document, data, length, &err);
> 
> klass->load_data can be NULL for backends not supporting it, so we should check
> it here and maybe even fill an error saying loading from data is not supported
> by that document.

Right. I added EV_DOCUMENT_ERROR_UNSUPPORTED to EvDocumentError for it. I will attach patches in next comments.

Comment 5 Krzesimir Nowak 2012-04-26 14:30:47 UTC

Created attachment 212892 [details] [review]
Add loading documents from memory, for now only pdf backend.

Comment 6 Krzesimir Nowak 2012-04-26 14:32:42 UTC

Created attachment 212893 [details] [review]
Rest of backends have no loading from memory for now.

I wonder if this is correct way of doing it. I guess that instead of setting it to NULL, we should set it to some short default function that just sets an error and returns FALSE. That way, we do not have to worry about this vfunc being NULL.

Comment 7 José Aliste 2012-04-26 14:37:28 UTC

It seems you forgot to attach the updated first patch?

Comment 8 José Aliste 2012-04-26 15:14:36 UTC

I see, you actually obsoleted your own patch, fixed now. I don't see the problem with setting the vfunc to NULL, as long as the only way where it is used has the proper != NULL ward

Comment 9 Krzesimir Nowak 2012-04-27 11:22:32 UTC

(In reply to comment #8)
> I see, you actually obsoleted your own patch, fixed now.

Thanks.

> I don't see the
> problem with setting the vfunc to NULL, as long as the only way where it is
> used has the proper != NULL ward

Ok, such ward already exists.

Also, I am going to attach two more patches. First implements compression and decompression of documents in memory and another ones just adds some new functions to documentation (section files).

Comment 10 Krzesimir Nowak 2012-04-27 11:23:24 UTC

Created attachment 212949 [details] [review]
Adds compression and decompression functions for documents in memory.

Comment 11 Krzesimir Nowak 2012-04-27 11:23:58 UTC

Created attachment 212950 [details] [review]
Adds some newly added functions to documentation.

Comment 12 Krzesimir Nowak 2012-04-27 13:24:56 UTC

Created attachment 212963 [details] [review]
Adds loading DJVU documents from memory.

And yet another patch, but highly experimental. Murray, I guess that you will want to use loading from memory in your project. If so, it would be great if you could check if this code works for you.

Comment 13 Carlos Garcia Campos 2012-05-02 12:05:06 UTC

Review of attachment 212892 [details] [review]:

Patch looks great, there are just a few issues that I've fixed before pushing the patch. Thanks!

::: libdocument/ev-document-factory.c
@@ +154,3 @@
+
+	content_type = g_content_type_guess (NULL, data, length,
+		&content_type_uncertain);

g_content_type_guess accepts NULL for result_uncertain and you are not using content_type_uncertain, so we can just pass NULL.

@@ +161,3 @@
+                                     _("Unknown MIME Type"));
+
+		return NULL;

You are leaking content_type here

@@ +323,3 @@
+ * %EV_DOCUMENT_ERROR_ENCRYPTED.
+ *
+ * Returns: a new #EvDocument, or %NULL.

This is missing transfer full annotation

@@ +326,3 @@
+ */
+EvDocument *
+ev_document_factory_get_document_from_data (const guchar *data, gsize length, GError **error)

We can split this method, moving most of the code into a ev_document_factory_load_data() for consistency with ev_document_factory_load_uri().

@@ +358,3 @@
+	*/
+	data_unc = NULL;
+	data_unc_length = 0;

These are already initialized.

@@ +380,3 @@
+		&err);
+
+	if (result == FALSE) {

We can use an early return here

::: libdocument/ev-document.c
@@ +353,3 @@
+ *
+ * On failure, %FALSE is returned and @error is filled in.
+ * If the document is encrypted, EV_DEFINE_ERROR_ENCRYPTED is returned.

This should be EV_DOCUMENT_ENCRYPTED and with % prefix

@@ +354,3 @@
+ * On failure, %FALSE is returned and @error is filled in.
+ * If the document is encrypted, EV_DEFINE_ERROR_ENCRYPTED is returned.
+ * If the backend cannot load the specific document, EV_DOCUMENT_ERROR_INVALID

%EV_DOCUMENT_ERROR_INVALID

@@ +397,3 @@
+		EvDocumentPrivate *priv = document->priv;
+
+		priv->uri = NULL;

Instance structures are initialized to 0 when allocated by glib, so uri is already NULL.

::: libview/ev-jobs.c
@@ +990,3 @@
+		} else {
+			const guchar *uncompressed_data = g_object_get_data (G_OBJECT (job->document),
+									    "data-uncompressed");

Thias doesn't match what you set in ev-document-factory, it was data_unc-uncompessed

@@ +992,3 @@
+									    "data-uncompressed");
+			gsize uncompressed_data_length = GPOINTER_TO_SIZE (g_object_get_data (G_OBJECT (job->document),
+											      "data-length-uncompressed"));

Ditto.

Comment 14 Carlos Garcia Campos 2012-05-02 12:06:10 UTC

Review of attachment 212893 [details] [review]:

There's no need to explicitly set to NULL the vfunc on all backends.

Comment 15 Carlos Garcia Campos 2012-05-02 12:18:02 UTC

Review of attachment 212949 [details] [review]:

I think all compression commands can read input from stdin and send the output to stdout, so in the case of data, we could avoid using temp files.

Comment 16 Christian Persch 2012-05-09 17:50:38 UTC

I discussed this with Carlos on IRC and we agreed that it would be much better to not limit this to in memory data, but instead expose a method to read from a GInputStream (which you can then use with a GMemoryInputStream to just feed it in-memory data). That will also simplify the decompression setup since we can just use a GConverterInputStream (although only the gz converter is in stock glib; but there are patches / standalone files around for bzip2 and xz which we could either slurp into evince or convince the glib devs to adopt).

Comment 17 Christian Persch 2012-06-04 11:30:28 UTC

Comment on attachment 212892 [details] [review]
Add loading documents from memory, for now only pdf backend.

This was backed out, and re-done differently. We now support loading EvDocument from GInputStream and GFile. So for data, just wrap it into a GMemoryInputStream.

Comment 18 Christian Persch 2012-06-04 11:31:10 UTC

Comment on attachment 212963 [details] [review]
Adds loading DJVU documents from memory.

This needs to be adapted to the new backend functions load_stream and load_gfile.

Comment 19 Krzesimir Nowak 2012-06-06 13:42:57 UTC

Created attachment 215750 [details] [review]
Adds loading DJVU documents from memory, v2.

The loader using GFile and the one using uri could somehow use GCancellable together with ddjvu_job_t. For now they don't.

Comment 20 Carlos Garcia Campos 2012-06-10 08:33:31 UTC

Review of attachment 215750 [details] [review]:

Thanks for the patch. I have some comments.

::: backend/djvu/djvu-document.c
@@ +257,3 @@
+djvu_document_load_stream (EvDocument          *document,
+			   GInputStream        *stream,
+			   EvDocumentLoadFlags  flags G_GNUC_UNUSED,

Why the G_GNUC_UNUSED, does it produce a compile warning without that?

@@ +271,3 @@
+
+	if (!doc) {
+		djvu_handle_events (djvu_document, TRUE, &djvu_error);

Is this possible? what error can happen if we have just created the document without giving a filename to decode?

@@ +297,3 @@
+	while ((msg = ddjvu_message_peek (ctx))) {
+		switch (msg->m_any.tag) {
+			case DDJVU_NEWSTREAM: {

I think this while + switch could be simplified by using 

djvu_wait_for_message (djvu_document, DDJVU_NEWSTREAM, &djvu_error);

First message should be a newstream, and first stream is always streamid = 0. After reading the stream then you can do 

djvu_wait_for_message (djvu_document, DDJVU_DOCINFO, &djvu_error);

That would ignore all other streams (that we don't support yet) and wait for docinfo message.

@@ +302,3 @@
+				/* first stream */
+				if (streamid == 0) {
+#define CHUNKSIZE 1024 * 1024

why 1024 * 1024?

@@ +304,3 @@
+#define CHUNKSIZE 1024 * 1024
+					GByteArray* array = g_byte_array_new ();
+					guint8* buffer = g_malloc (CHUNKSIZE);

Would it be possible to alloc this in the stack instead?

@@ +307,3 @@
+					GError* read_error = NULL;
+
+					while (1) {

I think it would be clearer to use a do while

do {
....
} while (read_bytes != 0);

since you are returning in case of error.

@@ +315,3 @@
+
+						if (read_bytes < 0) {
+							if (read_error) {

I think g_input_stream_read always fills the error in case of error.

@@ +332,3 @@
+							return FALSE;
+						} else if (read_bytes) {
+							g_byte_array_append (array,

Why are you caching the data instead of just calling ddjvu_stream_write() for every chunk read?

@@ +339,3 @@
+						}
+					}
+					ddjvu_stream_write (doc,

Do you really need to write all the data in one step?

Comment 21 Krzesimir Nowak 2012-06-11 10:11:11 UTC

(In reply to comment #20)
> Review of attachment 215750 [details] [review]:
> 
> Thanks for the patch. I have some comments.
> 
> ::: backend/djvu/djvu-document.c
> @@ +257,3 @@
> +djvu_document_load_stream (EvDocument          *document,
> +               GInputStream        *stream,
> +               EvDocumentLoadFlags  flags G_GNUC_UNUSED,
> 
> Why the G_GNUC_UNUSED, does it produce a compile warning without that?

Just have a habit of using it for C sources. I guess that some warnings are issued, depends on warnings verbosity.

> @@ +271,3 @@
> +
> +    if (!doc) {
> +        djvu_handle_events (djvu_document, TRUE, &djvu_error);
> 
> Is this possible? what error can happen if we have just created the document
> without giving a filename to decode?

I am just following one of general conventions in ddjvuapi.h:
- all functions returning a pointer might return a null pointer.

> @@ +297,3 @@
> +    while ((msg = ddjvu_message_peek (ctx))) {
> +        switch (msg->m_any.tag) {
> +            case DDJVU_NEWSTREAM: {
> 
> I think this while + switch could be simplified by using 
> 
> djvu_wait_for_message (djvu_document, DDJVU_NEWSTREAM, &djvu_error);
> 
> First message should be a newstream, and first stream is always streamid = 0.
> After reading the stream then you can do 
> 
> djvu_wait_for_message (djvu_document, DDJVU_DOCINFO, &djvu_error);
> 
> That would ignore all other streams (that we don't support yet) and wait for
> docinfo message.

Hm, but shouldn't we call ddjvu_stream_close() for every stream, be it supported or not? I mean - newstream messages probably have to be handled as well. Also, the structure of this loop is just a copy of what was proposed in ddjvuapi.h.

> 
> @@ +302,3 @@
> +                /* first stream */
> +                if (streamid == 0) {
> +#define CHUNKSIZE 1024 * 1024
> 
> why 1024 * 1024?
> 

Frankly speaking - no idea. I was wondering what chunk size should I use.

> @@ +304,3 @@
> +#define CHUNKSIZE 1024 * 1024
> +                    GByteArray* array = g_byte_array_new ();
> +                    guint8* buffer = g_malloc (CHUNKSIZE);
> 
> Would it be possible to alloc this in the stack instead?

I alloced it on stack at first when I chose chunk size to be just 1024. But I prefered to avoid allocing one MiB on stack - it is rather scarce resource.

> 
> @@ +307,3 @@
> +                    GError* read_error = NULL;
> +
> +                    while (1) {
> 
> I think it would be clearer to use a do while
> 
> do {
> ....
> } while (read_bytes != 0);
> 
> since you are returning in case of error.
> 

Ok.

> @@ +315,3 @@
> +
> +                        if (read_bytes < 0) {
> +                            if (read_error) {
> 
> I think g_input_stream_read always fills the error in case of error.

Ok.

> @@ +332,3 @@
> +                            return FALSE;
> +                        } else if (read_bytes) {
> +                            g_byte_array_append (array,
> 
> Why are you caching the data instead of just calling ddjvu_stream_write() for
> every chunk read?

Must have confused something here about calling write once per stream. Sorry. Now I call it for every chunk, so I removed GByteArray, used chunk size of 1024 for buffer which is now alloced on stack.

> @@ +339,3 @@
> +                        }
> +                    }
> +                    ddjvu_stream_write (doc,
> 
> Do you really need to write all the data in one step?

Comment 22 Krzesimir Nowak 2012-06-11 10:12:17 UTC

Created attachment 216092 [details] [review]
Adds loading DJVU documents from memory, v3.

New patch. Without while/switch loop simplification for now.

Comment 23 Carlos Garcia Campos 2012-06-17 07:35:54 UTC

(In reply to comment #21)
> > @@ +271,3 @@
> > +
> > +    if (!doc) {
> > +        djvu_handle_events (djvu_document, TRUE, &djvu_error);
> > 
> > Is this possible? what error can happen if we have just created the document
> > without giving a filename to decode?
> 
> I am just following one of general conventions in ddjvuapi.h:
> - all functions returning a pointer might return a null pointer.

Sure, what I meant was whether it's possible to receive an error message at this point. I don't think so, so you can just fill an error without waiting for an error message like we do in djvu_document_load().

> > @@ +297,3 @@
> > +    while ((msg = ddjvu_message_peek (ctx))) {
> > +        switch (msg->m_any.tag) {
> > +            case DDJVU_NEWSTREAM: {
> > 
> > I think this while + switch could be simplified by using 
> > 
> > djvu_wait_for_message (djvu_document, DDJVU_NEWSTREAM, &djvu_error);
> > 
> > First message should be a newstream, and first stream is always streamid = 0.
> > After reading the stream then you can do 
> > 
> > djvu_wait_for_message (djvu_document, DDJVU_DOCINFO, &djvu_error);
> > 
> > That would ignore all other streams (that we don't support yet) and wait for
> > docinfo message.
> 
> Hm, but shouldn't we call ddjvu_stream_close() for every stream, be it
> supported or not? I mean - newstream messages probably have to be handled as
> well.

Right.

> Also, the structure of this loop is just a copy of what was proposed in
> ddjvuapi.h.

Yes, but in the middle of this function it makes the function a bit complex in my opinion. I'm not sure it's possible to receive a DOCINFO message at that point, but if it's possible we could move the switch to a new function, something like this:

gboolean djvu_read_streams (DjvuDocument *djvu_document, GInputStream *stream, GError **error);

if something fails or DOCINFO message hasn't been received the function returns FALSE and fills the error. That would simplify the caller to do something like:

if (!djvu_read_streams (djvu_document, DDJVU_DOCINFO, &djvu_error)) {
        g_set_error_literal (error,
	                             EV_DOCUMENT_ERROR,
                                     EV_DOCUMENT_ERROR_INVALID,
                                     djvu_error->message);
        g_error_free (djvu_error);
        ddjvu_document_release (djvu_document->d_document);
        djvu_document->d_document = NULL;

        return FALSE;
}

Comment 24 Carlos Garcia Campos 2012-06-17 07:39:10 UTC

Review of attachment 216092 [details] [review]:

::: backend/djvu/djvu-document.c
@@ +305,3 @@
+
+					do {
+#define CHUNKSIZE 1024

No need to define it here and undefine it later, just define it globally. I would call it something like READ_BUFFER_SIZE

Comment 25 Christian Persch 2012-07-06 18:30:38 UTC

g_file_load_contents() uses a block size of 8192, so we probably should copy that if just for consistency.

Comment 26 GNOME Infrastructure Team 2018-05-22 14:19:16 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/evince/issues/233.