After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 782854 - TNEF support
TNEF support
Status: RESOLVED OBSOLETE
Product: geary
Classification: Other
Component: attachments
master
Other Linux
: Normal enhancement
: ---
Assigned To: Geary Maintainers
Geary Maintainers
Depends on: 714883
Blocks:
 
 
Reported: 2017-05-20 05:24 UTC by Oliver
Modified: 2018-08-15 10:35 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
patch implementing basic TNEF support (11.28 KB, patch)
2017-05-20 05:24 UTC, Oliver
needs-work Details | Review
WIP tnef support via libytnef (5.25 KB, patch)
2018-06-02 10:12 UTC, Oliver
none Details | Review
WIP tnef support via libytnef v2 (5.48 KB, patch)
2018-06-09 09:28 UTC, Oliver
reviewed Details | Review

Description Oliver 2017-05-20 05:24:02 UTC
Created attachment 352200 [details] [review]
patch implementing basic TNEF support

TNEF is a Microsoft message encapsulation format[1]. Outlook and Exchange Server might bundle up all attachments and send them inside a TNEF file named winmail.dat instead of using MIME.

The Gmail webinterface automatically reads these files and presents any packaged attachments just as it does regular attachments. Thunderbird does not support these files, but there is a 16-year old bug about it[2] and a plugin that enables support[3]

Recently I started getting a lot of these messages, and the manual extraction process was frustrating enough that I implemented a small patch to Geary to display them directly. It essentially moves the body of the main foreach loop of do_save_attachments_db into a separate function, calling it once for regular attachments and if the "parent" attachment is a TNEF file, parsing the file and calling it once for each embedded attachment. Theoretically there can be extra useful stuff we could extract from the TNEF, but the attachments are for me the most critical.

This is a quick and dirty patch, I don't really speak Vala or know the Geary code conventions so I'd be happy for someone to adopt it, or if not provide guidance about bringing it up to quality for inclusion.

[1] https://en.wikipedia.org/wiki/Transport_Neutral_Encapsulation_Format
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=77811
[3] https://addons.mozilla.org/en-us/thunderbird/addon/lookout/
Comment 1 Michael Gratton 2018-04-13 00:26:04 UTC
Review of attachment 352200 [details] [review]:

Hi there, thanks for the patch!

I'm not too inclined to include support for proprietary content-types, for many of the same reason web browsers avoid supporting them. Also like browsers, I think this might be reasonable to support as a plugin however.

Of course, Geary does not actually have a plugin system at the moment, but Bug 714883 covers adding one. Handling additional content types like this would be a good use case for the design of the plugin system, and I'd be happy to work with you on getting that implemented, if that was something you'd be willing to help work on.
Comment 2 Oliver 2018-04-13 17:53:31 UTC
Hi Mike,

While I might agree with you in principle about proprietary formats, it's not all that bad. It's thoroughly documented by Microsoft[1] and there are several open source library implementations out there. Also I have a pragmatic streak ;)

One of the things I like about Geary is that it's relatively lightweight and uncomplicated to use. Plugin ecosystems tend to be a bit of a wild west and I'm worried about the potential impact of that on Geary.

If you are 100% committed to this path, then I'll roll with it and help implement where I can.

[1] https://msdn.microsoft.com/en-us/library/cc425498(v=exchg.80).aspx
Comment 3 Michael Gratton 2018-05-20 04:26:49 UTC
Yeah, fair enough - both Postel's Law and the general principal of being nice to the user are both good arguments for doing it. So let's get this in!

I'd like to use an existing decoder implementation rather than write our own however, since it would be good for us to not have to maintain and work-around format quirks from all the different mailers that send TNEF, ourselves. Evolution uses https://github.com/Yeraze/ytnef, so that would be probably be a good place to start, but it might be good to see if any other exists that is based on GLib and hopefully with an async implementation?

Also, I recently re-organised the code that manages how folders are saved, so the patch would need to be updated a bit. There's now a Geary.RFC822.Part object that manages decoding parts and Geary.ImapDB.Attachment now handles saving them. Maybe the Part class might be a good place to check for TNEF, and decode, then saved in Attachment?
Comment 4 Oliver 2018-05-25 14:58:25 UTC
I've checked out the new code from master and it looks to me like inside get_attachments_recursively, there could be a test for content-type == "application/vnd.ms-tnef" which would parse and loop through the attachments, creating a Part from each one (like what is happening for GMime.MessagePart a few lines above) and calling attachments.add() on it. That way all consumers of this attachments object would transparently see the "real" attachments.

I think ytnef is the only sensible option. The only other option already available in my distro's repositories is ktnef which wants to drag in the whole of kde. Regarding an async api, doesn't the use of write_to_stream in the GMime.MessagePart block anyway mean this is synchronous?

Unfortunately, the latest ytnef release fails to parse several tnef files I tried. Strangely (because there don't seem to be any significant extra commits), compiling it from git produced somewhat better results, but I still had to remove this check[1] to get one to extract properly. Anyway I can chase that upstream, I still think it's the best option.

Coming to the actual implementation, I could use some help. I know C and glib  well, but unfortunately have little patience for learning Vala, and especially this vapi binding stuff is driving me crazy.

I wrote a gist[2] of what I think is needed from ytnef in C. The question is, how to bind all the necessary parts? There are struct definitions, function calls, macros, defines, and linked-list traversal :/


[1] https://github.com/Yeraze/ytnef/blob/master/lib/ytnef.c#L510
[2] https://gist.github.com/ohwgiles/be14e9e9d010d4435a93389e618e2ee9
Comment 5 Oliver 2018-06-02 10:12:08 UTC
Created attachment 372517 [details] [review]
WIP tnef support via libytnef

Fortune favours the doggedly persistent or something. The attached patch "works" in the sense that attachments in a tnef file are visible and downloadable in the received message window with their correct filenames. However the file content seems corrupted in an inconsistent way. Adding the following:

GMime.Stream fs = new GMime.StreamFile.for_path("/tmp/" + gmime_part.get_filename(), "wb");
gmime_part.write_to_stream(fs);

right after attachments.add(new Part(gmime_part)) however *does* output the correct contents exactly (plus a MIME header).

I'm hoping to get some feedback on this patch, especially the vapi bindings, before digging further into this. I'm sure there's much to improve...
Comment 6 Michael Gratton 2018-06-05 03:14:10 UTC
Review of attachment 372517 [details] [review]:

Hey Oliver,

Thanks for looking into the library situation. I wonder if there's some flags/hacks needed to get those examples you have working? Did you hear back from the ytnef maintainers about it?

Hooking this into Message.get_attachments_recursively is definitely the right way to do it, and I'm pretty happy with this patch in general. I haven't looked at the ytnef VAPI binding but assume its probably right if its compiling. ;)

Did you have a look at the corrupted attachment files that get saved to ~/.local share/geary/... in a hex editor or similar? Was there any sign of say the header showing up in the attachment files on disk as well? If you take a look at how saving works in Part.write_to_stream, its behaviour depends on the MIME content type, which it picks up from the GMime.Part. This is also important when opening attachments from the app, since it will look at that (then maybe fall back to sniffing, but that kind of sucks). I don't know if TNEF supplies MIME content types for its attachments, but if so (or of there's some other source of type info, like a file name extension) then it would be good to set that on the dummy GMime.Part you're constructing. The Part constructor also picks up some of other bits of metadata from the GMime part, so it would also be good to get those filled in as well - the content disposition in particular (since that's where the filename is specified and attachment/inline distinction), but also any human-readable description as well.

The other thing to check is the encoding specified on the GMime.DataWrapper object. I assume the data being passed in is binary (i.e. no longer has any transport encoding applied to it) since that's the job ytnef is supposed to be doing, so you might need to set its encoding to GMmime.ContentEncoding.BINARY?

One last possibility would be to avoid the overhead of having to do the Ytnef.Attachment -> GMime.Part -> RFC822.Part dance altogether, and just create the RFC822.Part directly from the Ytnef.Attachment. Since RFC822.Part completely encapsulates the GMime object, you could (for example) make that an abstract class with two internal derived classes: RFC822.GMimePart and RFC822.YtnefPart, with different implementations to extract the metadata and content from the two different sources.
Comment 7 Oliver 2018-06-05 05:46:08 UTC
Mike,

Thanks for reviewing, this is exactly the kind of feedback I was hoping for. I'll dig into it some more.

Regarding the decoding issues I had, unfortunately Debian (and presumably Ubuntu as well) applied a patch to address CVE-2017-9058 which broke parsing of a standard tnef example file. More details on the github ticket[1], but I've been in touch with Yeraze and submitted fixes for this and several other issues which have been merged already.

I didn't find a reliable field in TNEF for the content type. There is a field for the file extension, but I suspect sniffing is probably the safest way anyway.

[1] https://github.com/Yeraze/ytnef/issues/45
Comment 8 Oliver 2018-06-09 09:28:56 UTC
Created attachment 372619 [details] [review]
WIP tnef support via libytnef v2

The updated patch now works completely, the corruption issue I saw was due to write_to_stream incorrectly interpreting the content as text and trying to decode it to utf8. Thanks for the pointer.

I experimented with turning RFC822.Part into an abstract class with MimePart and TnefPart internal child classes, but it seemed to me to just create a mess and indicate that Tnef is somehow on the same "level" as Mime. I've cleaned up the first approach, would you consider this acceptable?

By the way, is this still the right place for this discussion or should I create a PR on GNOME gitlab?
Comment 9 Michael Gratton 2018-06-11 09:57:12 UTC
Review of attachment 372619 [details] [review]:

This looks pretty good. A few points I'd make:

- The minor style nit noted below
- It would be really useful to get a unit test added for Message.get_attachments() with an example of an TNEF attachment. Have a look at loading an example message body as a GResource in test/engine/rfc822-message-test.vala.
- I'm about to rip out the CMake build config (maybe tonight) now that all issues with the Meson build have been fixed, so adding ytnef to the meson build would be great.
- To polish this off, add the new ytnef dependency for both Fedora and Ubuntu to INSTALL, the package to Depends and Build-Depends in debian/control and add a module for it to the flatpak manifest org.gnome.Geary.json.

It would be great if you could submit this as a MR in gitlab now we're there. If you rebase this off master as of today, you'll also want to add the same ytnef dependenciest to the new .gitlab-ci.yml file as added to INSTALL, to ensure the Gitlab CI tests pass. Doing will ensure that the dependencies for Ubuntu/Fedora/deb and Flatpak are all sorted out correctly, since it will do a test build for all four build fine and ensure that your patch builds pretty much everywhere.

::: src/engine/rfc822/rfc822-message.vala
@@ +870,3 @@
+                    ByteArray tnef_data = stream.get_byte_array();
+                    Ytnef.TNEFStruct tn;
+                    if(Ytnef.ParseMemory(tnef_data.data, out tn) == 0) {

Minor style nit: there should be a space between the "if" and the lparen here and for the if- and for- expressions below.
Comment 10 Oliver 2018-06-15 12:37:43 UTC
I think I've addressed all the above. I've opened a MR, shall we move the discussion there?

https://gitlab.gnome.org/GNOME/geary/merge_requests/10
Comment 11 Michael Gratton 2018-06-15 15:57:51 UTC
Sounds like a plan, I'll keep this open for the moment however until it lands.
Comment 12 Michael Gratton 2018-08-15 10:34:56 UTC
Resolving in favour of the Gitlab issue.