After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 786539 - Ensure mp3 extractor doesn't step out of mmap'ped memory
Ensure mp3 extractor doesn't step out of mmap'ped memory
Status: RESOLVED FIXED
Product: tracker
Classification: Core
Component: Extractor
unspecified
Other Linux
: Normal normal
: ---
Assigned To: tracker-extractor
tracker-extractor
Depends on:
Blocks:
 
 
Reported: 2017-08-20 12:20 UTC by Carlos Garnacho
Modified: 2017-08-22 16:56 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
extract/mp3: Check for experimental flag right after fetching flags (2.51 KB, patch)
2017-08-20 12:32 UTC, Carlos Garnacho
committed Details | Review
extract/mp3: Make clearer checks on offsets read from file (3.04 KB, patch)
2017-08-20 12:32 UTC, Carlos Garnacho
committed Details | Review
extract/mp3: Assert that the reader position is within mmap boundaries (1.72 KB, patch)
2017-08-20 12:32 UTC, Carlos Garnacho
committed Details | Review

Description Carlos Garnacho 2017-08-20 12:20:38 UTC
Coverity taints the big endian->host format translations done for ID3, which ends up in warnings because those sometimes contain offsets within the file (CID #162187).

It seems to me all calculations already prevent any jumps out of mmap'ped memory boundaries (NB: This may happen on legit cases if tags exceed the 5MB limit we read, this is feasible with eg. files containing multiple album art tags). However, I'm attaching some patches that straighten out the checks a bit and make others to happen explicitly.
Comment 1 Carlos Garnacho 2017-08-20 12:32:04 UTC
Created attachment 358013 [details] [review]
extract/mp3: Check for experimental flag right after fetching flags

No need to do further operations here if the experimental flag is
enabled.
Comment 2 Carlos Garnacho 2017-08-20 12:32:10 UTC
Created attachment 358014 [details] [review]
extract/mp3: Make clearer checks on offsets read from file

Put these on one side of the operand without added values.

Spotted by Coverity (CID #162187) (Tentative fix)
Comment 3 Carlos Garnacho 2017-08-20 12:32:16 UTC
Created attachment 358015 [details] [review]
extract/mp3: Assert that the reader position is within mmap boundaries

The 'pos' variable is always at the start of a frame, which means pos
can't surpass the mmap size minus the frame size. This just happens
implicitly, and is a programming error if we let that ever happen, so
let's make it really sure this isn't possible.
Comment 4 Philip Withnall 2017-08-22 09:16:45 UTC
Review of attachment 358013 [details] [review]:

Looks good.

::: src/tracker-extract/tracker-extract-mp3.c
@@ +1838,3 @@
+	/* We don't handle experimental cases */
+	if (experimental) {
+		g_message ("[v24] Experimental MP3s are not extracted, doing nothing");

ooi, shouldn’t this be g_debug() rather than g_message()?

(That would be a change for a separate patch, if at all.)
Comment 5 Philip Withnall 2017-08-22 09:25:27 UTC
I think this code would benefit from merging parse_id3v23() and parse_id3v24(). From comparing their implementations, the only differences are:
 • The sub-calls to get_id3v*_tags().
 • 5 lines of code around ‘We found a IDv2 footer’ in the v24 implementation.
 • A missing `continue` after “[v23] Content size was 0, moving to next frame” in the v23 implementation (I suspect this is a bug?).

Similarly, get_id3v*_tags() look almost identical, apart from a missing (offset >= csize) check in the get_id3v23_tags() implementation (ID3V24_COMM case).
Comment 6 Philip Withnall 2017-08-22 09:31:33 UTC
Review of attachment 358014 [details] [review]:

This looks good to me. The only non-obvious change is for (pos + frame_size + csize > size) ⇒ (csize > size - frame_size - pos), where we want (frame_size + pos <= size) otherwise we underflow. That’s guaranteed, however, by the fact that:
 • (pos + frame_size <= tsize + header_size) from the start of the loop
 • (tsize + header_size <= size) from before entering the loop
 • Hence (pos + frame_size <= size)
Comment 7 Philip Withnall 2017-08-22 09:35:20 UTC
(In reply to Philip Withnall from comment #5)
> I think this code would benefit from merging parse_id3v23() and
> parse_id3v24(). From comparing their implementations, the only differences
> are:
>  • The sub-calls to get_id3v*_tags().
>  • 5 lines of code around ‘We found a IDv2 footer’ in the v24 implementation.
>  • A missing `continue` after “[v23] Content size was 0, moving to next
> frame” in the v23 implementation (I suspect this is a bug?).
> 
> Similarly, get_id3v*_tags() look almost identical, apart from a missing
> (offset >= csize) check in the get_id3v23_tags() implementation (ID3V24_COMM
> case).

Same also for parse_id3v20(), actually.
Comment 8 Philip Withnall 2017-08-22 09:35:42 UTC
Review of attachment 358015 [details] [review]:

++
Comment 9 Carlos Garnacho 2017-08-22 16:45:26 UTC
Thanks Philip for the extra eyes :).

(In reply to Philip Withnall from comment #4)
> ::: src/tracker-extract/tracker-extract-mp3.c
> @@ +1838,3 @@
> +	/* We don't handle experimental cases */
> +	if (experimental) {
> +		g_message ("[v24] Experimental MP3s are not extracted, doing nothing");
> 
> ooi, shouldn’t this be g_debug() rather than g_message()?

Tracker executables have a special log handler that also manage visibility of g_message(). Would be nice to reduce/conflate logging levels but not too high priority :).

(In reply to Philip Withnall from comment #5)
> I think this code would benefit from merging parse_id3v23() and
> parse_id3v24(). From comparing their implementations, the only differences
> are:
>  • The sub-calls to get_id3v*_tags().
>  • 5 lines of code around ‘We found a IDv2 footer’ in the v24 implementation.
>  • A missing `continue` after “[v23] Content size was 0, moving to next
> frame” in the v23 implementation (I suspect this is a bug?).

Oops, it is.

> 
> Similarly, get_id3v*_tags() look almost identical, apart from a missing
> (offset >= csize) check in the get_id3v23_tags() implementation (ID3V24_COMM
> case).

It would indeed be nice to put all that together... IIRC there's also frame/header incompatibilities, at least between ID3v20 and 2.3/2.4, but still easy enough to merge in a single codepath.

(In reply to Philip Withnall from comment #6)
> Review of attachment 358014 [details] [review] [review]:
> 
> This looks good to me. The only non-obvious change is for (pos + frame_size
> + csize > size) ⇒ (csize > size - frame_size - pos), where we want
> (frame_size + pos <= size) otherwise we underflow. That’s guaranteed,
> however, by the fact that:
>  • (pos + frame_size <= tsize + header_size) from the start of the loop
>  • (tsize + header_size <= size) from before entering the loop
>  • Hence (pos + frame_size <= size)

Exactly :). The last patch making that an assert() should clear up all doubts to the casual reader.
Comment 10 Carlos Garnacho 2017-08-22 16:47:33 UTC
Attachment 358013 [details] pushed as d7643db - extract/mp3: Check for experimental flag right after fetching flags
Attachment 358014 [details] pushed as be8b267 - extract/mp3: Make clearer checks on offsets read from file
Attachment 358015 [details] pushed as 2f3d04c - extract/mp3: Assert that the reader position is within mmap boundaries
Comment 11 Philip Withnall 2017-08-22 16:56:05 UTC
(In reply to Carlos Garnacho from comment #9)
> Thanks Philip for the extra eyes :).