GNOME Bugzilla – Bug 614479
[mpegtspacketizer] Use CRC to check if tables are duplicate
Last modified: 2013-06-23 07:07:09 UTC
Created attachment 157595 [details] [review] Added CRC comparison The current code just uses table id, subtable extension and version number to check if the section has been seen before. However, this comparison is not sufficient, causing actually new tables being dismissed. The attached patch retrieves the CRC and compares those as well. The patch is based on a discussion I had with Zaheer Merali.
commit d7ab75abb8dca1f1b09509a98ce5edb441c5e3fd Author: Sebastian Pölsterl <sebp@k-d-w.org> Date: Fri Apr 2 19:52:25 2010 +0200 mpegtspacketizer: Additionally use the CRC to check if tables are duplicates The current code just uses table id, subtable extension and version number to check if the section has been seen before. However, this comparison is not sufficient, causing actually new tables being dismissed. Fixes bug #614479.
So actually it seems to be a bit more subtle than that. That fix did avoid "losing" data ... but a side effect is re-posting duplicated data. I have a stream here capture from Numericable (french cable ISP, 400+ channels) with a *lot* of NIT updates (over 4500 in 30seconds) while we do indeed need to check for both version AND CRC ... it seems to cycle. Here's a snippet of logs for a changed section (same PID, same subtable) 4707 version number:6 (previous6) crc 0xd3a293fa (previous:0xe71e8911) 4707 version number:6 (previous6) crc 0x49fc0a8e (previous:0xd3a293fa) 4707 version number:6 (previous6) crc 0x73de41ba (previous:0x49fc0a8e) 4707 version number:6 (previous6) crc 0x509e86da (previous:0x73de41ba) 4707 version number:6 (previous6) crc 0x11291ee (previous:0x509e86da) 4707 version number:6 (previous6) crc 0x7779998f (previous:0x11291ee) 4707 version number:6 (previous6) crc 0x73c9b6da (previous:0x7779998f) 4707 version number:6 (previous6) crc 0xe71e8911 (previous:0x73c9b6da) 4707 version number:6 (previous6) crc 0xd3a293fa (previous:0xe71e8911) 4707 version number:6 (previous6) crc 0x49fc0a8e (previous:0xd3a293fa) 4707 version number:6 (previous6) crc 0x73de41ba (previous:0x49fc0a8e) 4707 version number:6 (previous6) crc 0x509e86da (previous:0x73de41ba) 4707 version number:6 (previous6) crc 0x11291ee (previous:0x509e86da) Notice how it cycles back to a CRC of 0xd3a293fa with the same version number. So it seems that, when the version number doesn't change, we should accumulate all past CRC and check against all past CRC for that version number. If we already saw that CRC *AND* version_number, then we can skip parsing it. Else we parse it (since we never saw it). If only the version number changes, then we obviously parse it.
Created attachment 247280 [details] [review] mpegtspacketizer: Better detect already seen section In some cases (NIT on highly-populated DVB-C operator for example), there will be more than one section emitted for the same subtable and version number. In order not to lose those updates for the same version number, we checked against the CRC of the previous section we parsed. The problem is that, while it made sure we didn't lose any information, it also meant that if the same section came back (same version, same CRC) later on we would re-process it, re-parse it and re-emit it. This version improves on that by keeping a list of previously observed CRC for identical PID/subtable/version-number and will only process sections if they really were never seen in the past (as opposed to just before). On a 30s clip, this brings down the number of NIT section parsing from 4541 down to 663.
After more investigation, we are actually ignoring the section_number and last_section_number of various SI. For a same PID/table_id/sub_table, and due to the limitation that sections can't be bigger than 1024bytes (or 4096 for EIT), if you want to carry more information than that you need to split it up in several "sub-sections" (not to be confused with sub-tables). I guess that could be used as an improvement on the above patch to know how many sub-sections are expected, and ensure the CRC has not changed for that specific section_number ... or not and just satisfy ourselves with a list of CRC :) I think I remembered there being some sections that did not have those fields though (but can't find it yet :( )
As a reminder, the section_syntax_indicator 1bit flag is what specifies whether the section_number/last_section_number (and some other fields) are present or not. Wondering if we shouldn't handle this properly. It would also avoid having misc GList usage (creation, iteration, ...).
Created attachment 247334 [details] [review] mpegtspacketizer: Better detect already seen section In some cases (NIT on highly-populated DVB-C operator for example), there will be more than one section emitted for the same subtable and version number. In order not to lose those updates for the same version number, we checked against the CRC of the previous section we parsed. The problem is that, while it made sure we didn't lose any information, it also meant that if the same section came back (same version, same CRC) later on we would re-process it, re-parse it and re-emit it. This version improves on that by keeping a list of previously observed CRC for identical PID/subtable/version-number and will only process sections if they really were never seen in the past (as opposed to just before). On a 30s clip, this brings down the number of NIT section parsing from 4541 down to 663.
commit 0592bcc3c90125f8a9a68ee8e8940176b9c8cc97 Author: Edward Hervey <edward@collabora.com> Date: Wed Jun 19 20:39:54 2013 +0200 mpegtspacketizer: Better detect already seen section In some cases (NIT on highly-populated DVB-C operator for example), there will be more than one section emitted for the same subtable and version number. In order not to lose those updates for the same version number, we checked against the CRC of the previous section we parsed. The problem is that, while it made sure we didn't lose any information, it also meant that if the same section came back (same version, same CRC) later on we would re-process it, re-parse it and re-emit it. This version improves on that by keeping a list of previously observed CRC for identical PID/subtable/version-number and will only process sections if they really were never seen in the past (as opposed to just before). On a 30s clip, this brings down the number of NIT section parsing from 4541 down to 663. https://bugzilla.gnome.org/show_bug.cgi?id=614479