After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 693056 - subparse: re-enable sami support
subparse: re-enable sami support
Status: RESOLVED FIXED
Product: GStreamer
Classification: Platform
Component: gst-plugins-base
1.0.5
Other Linux
: Normal major
: 1.0.9
Assigned To: GStreamer Maintainers
GStreamer Maintainers
Depends on:
Blocks:
 
 
Reported: 2013-02-02 15:12 UTC by Changwoo Ryu
Modified: 2013-07-16 18:03 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Fix SAMI subtitle parsing build (1.90 KB, patch)
2013-02-16 08:58 UTC, Changwoo Ryu
rejected Details | Review
Allow building SAMI parsing code in 1.0 (5.51 KB, patch)
2013-05-15 11:00 UTC, Arnaud Vrac
rejected Details | Review
reimplement samiparse without libxml dependency (27.43 KB, patch)
2013-05-25 08:33 UTC, Young-Ho Cha
needs-work Details | Review
reimplement samiparse without libxml dependency (32.79 KB, patch)
2013-05-28 09:33 UTC, Young-Ho Cha
committed Details | Review
sami parser testcases (3.44 KB, patch)
2013-05-28 09:33 UTC, Young-Ho Cha
committed Details | Review

Description Changwoo Ryu 2013-02-02 15:12:28 UTC
The temporary commits 25fb0dd and b338e803 unconditionally disable sami support in subparse plugin in 1.x releases.

Please make it correctly detect libxml html support.
Comment 1 Changwoo Ryu 2013-02-16 08:58:39 UTC
Created attachment 236354 [details] [review]
Fix SAMI subtitle parsing build

It (again) checks libxml HTML feature and disable SAMI parsing conditionally.
Comment 2 Changwoo Ryu 2013-02-16 09:09:02 UTC
Additional note:

If anyone wants to do clean-up of splitting subparse's external dependent part, do it when it's ready. Breaking feature first is not desirable.
Comment 3 Arnaud Vrac 2013-05-15 11:00:05 UTC
Created attachment 244297 [details] [review]
Allow building SAMI parsing code in 1.0

Slightly cleaner version I had in my repo, that does not define GST_DISABLE_XML.
Comment 4 Young-Ho Cha 2013-05-25 08:33:56 UTC
Created attachment 245286 [details] [review]
reimplement samiparse without libxml dependency

Hello. I reimplement sami parser that use own html parser.
So, It is not require to use libxml anymore.

I tested with gstreamer 1.0, but it should work in 1.1 too.
Comment 5 Changwoo Ryu 2013-05-25 15:11:00 UTC
So if subparse doesn't depend on libxml for SAMI support, Is the libxml detection code still required in configure.in?
Comment 6 Nicolas Dufresne (ndufresne) 2013-05-25 15:44:07 UTC
Review of attachment 245286 [details] [review]:

I just given it a first look, looks promising.

::: gst/subparse/samiparse.c
@@ +460,3 @@
+
+static gchar *
+unescape_string (const gchar * text)

Have you looked at GMarkupParser in GLib ?
Comment 7 Tim-Philipp Müller 2013-05-25 15:53:32 UTC
GMarkupParser has the limitation that the input must be UTF-8, it will error out if it's not (which is unfortunate, because I bet making it work with any-odd 8-bit encoding would be a minor change, but it is what it is). Though one could convert the input to UTF-8 if needed of course, using any charset markers which may be there at the beginning.
Comment 8 Young-Ho Cha 2013-05-25 16:12:16 UTC
(In reply to comment #6)
> Review of attachment 245286 [details] [review]:
> 
> Have you looked at GMarkupParser in GLib ?

GMarkupParser cannot parse sami format.
Because sami format is based HTML, not xml.

And there are many ambiguous cases when parse sami file. so it need many workarounds.

After upload patch, I found some bugs.
I'll upload new patch in next week.

Thanks.
Comment 9 Tim-Philipp Müller 2013-05-25 16:41:36 UTC
Comment on attachment 245286 [details] [review]
reimplement samiparse without libxml dependency

Marking as needs-work as per comment #8.

Thanks for working on this!

It would be nice to have a simple xml/html parser in libgstpbutils or so, would come in handy elsewhere too (e.g. for smooth streaming plugin in -bad).

Out of curiosity, what is the main problem with HTML vs. XML here? That closing tags don't get automatically inserted when missing? Anything else?
Comment 10 Young-Ho Cha 2013-05-26 15:09:52 UTC
(In reply to comment #9)
> (From update of attachment 245286 [details] [review])
> Marking as needs-work as per comment #8.
> 
> Thanks for working on this!
> 
> It would be nice to have a simple xml/html parser in libgstpbutils or so, would
> come in handy elsewhere too (e.g. for smooth streaming plugin in -bad).
> 
> Out of curiosity, what is the main problem with HTML vs. XML here? That closing
> tags don't get automatically inserted when missing? Anything else?

1. html tag is not case-sensitive
2. attribute value may not be quoted
Comment 11 Young-Ho Cha 2013-05-28 08:01:19 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Out of curiosity, what is the main problem with HTML vs. XML here? That closing
> > tags don't get automatically inserted when missing? Anything else?
> 
> 1. html tag is not case-sensitive
> 2. attribute value may not be quoted

3. more html entities. please see https://git.gnome.org/browse/libxml2/tree/HTMLparser.c?id=v2.9.1#n1628
Comment 12 Young-Ho Cha 2013-05-28 09:33:04 UTC
Created attachment 245440 [details] [review]
reimplement samiparse without libxml dependency
Comment 13 Young-Ho Cha 2013-05-28 09:33:52 UTC
Created attachment 245441 [details] [review]
sami parser testcases

add more sami parser testcases
Comment 14 Tim-Philipp Müller 2013-07-16 18:03:24 UTC
Thanks for the patches!

Pushed to git master now, along with a memory leak fix, and removal of the now-obsolete libxml checks in configure.ac:

I'll cherry-pick it into the 1.0 branch as well, shouldn't affect anything else after all.


commit 79337c1ad2f23871dd80503555dac7a77930c9aa
Author: Tim-Philipp Müller <tim.muller@collabora.co.uk>
Date:   Tue Jul 16 18:42:19 2013 +0100

    configure: remove obsolete libxml checks
    
    https://bugzilla.gnome.org/show_bug.cgi?id=693056

commit 3eea81e4a2b32bf0188fc31b3d60322494f23c32
Author: Tim-Philipp Müller <tim.muller@collabora.co.uk>
Date:   Tue Jul 16 18:30:50 2013 +0100

    subparse: don't leak parse context for sami and qttext
    
    In gst_sub_parse_dispose() parser_type will be UNKNOWN,
    so these deinit calls were never executed. And we should
    clean up the parser state in the downwards state change
    anyway.

commit 6cea51c9623e6df6da3b3a9918ca250cd3d81ae1
Author: Young-Ho Cha <ganadist@gmail.com>
Date:   Tue May 28 16:56:28 2013 +0900

    tests: update sami parser testcases
    
    Remove libxml dependency for sami parser
    and add more testcases.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=693056

commit f597efe24b010bf50ea7f7ebcae92fb293df3167
Author: Young-Ho Cha <ganadist@gmail.com>
Date:   Sat May 25 17:10:14 2013 +0900

    subparse: remove libxml dependency for sami parser and re-enable sami parser
    
    To celebrate 2013.gnome.asia, updated sami parser for gstreamer 1.x. :D
    
    Remove conditional block for check libxml usage and
    implement a simple html markup parser for the sami
    parser.
    
    https://bugzilla.gnome.org/show_bug.cgi?id=693056