GNOME Bugzilla – Bug 693056
subparse: re-enable sami support
Last modified: 2013-07-16 18:03:24 UTC
The temporary commits 25fb0dd and b338e803 unconditionally disable sami support in subparse plugin in 1.x releases. Please make it correctly detect libxml html support.
Created attachment 236354 [details] [review] Fix SAMI subtitle parsing build It (again) checks libxml HTML feature and disable SAMI parsing conditionally.
Additional note: If anyone wants to do clean-up of splitting subparse's external dependent part, do it when it's ready. Breaking feature first is not desirable.
Created attachment 244297 [details] [review] Allow building SAMI parsing code in 1.0 Slightly cleaner version I had in my repo, that does not define GST_DISABLE_XML.
Created attachment 245286 [details] [review] reimplement samiparse without libxml dependency Hello. I reimplement sami parser that use own html parser. So, It is not require to use libxml anymore. I tested with gstreamer 1.0, but it should work in 1.1 too.
So if subparse doesn't depend on libxml for SAMI support, Is the libxml detection code still required in configure.in?
Review of attachment 245286 [details] [review]: I just given it a first look, looks promising. ::: gst/subparse/samiparse.c @@ +460,3 @@ + +static gchar * +unescape_string (const gchar * text) Have you looked at GMarkupParser in GLib ?
GMarkupParser has the limitation that the input must be UTF-8, it will error out if it's not (which is unfortunate, because I bet making it work with any-odd 8-bit encoding would be a minor change, but it is what it is). Though one could convert the input to UTF-8 if needed of course, using any charset markers which may be there at the beginning.
(In reply to comment #6) > Review of attachment 245286 [details] [review]: > > Have you looked at GMarkupParser in GLib ? GMarkupParser cannot parse sami format. Because sami format is based HTML, not xml. And there are many ambiguous cases when parse sami file. so it need many workarounds. After upload patch, I found some bugs. I'll upload new patch in next week. Thanks.
Comment on attachment 245286 [details] [review] reimplement samiparse without libxml dependency Marking as needs-work as per comment #8. Thanks for working on this! It would be nice to have a simple xml/html parser in libgstpbutils or so, would come in handy elsewhere too (e.g. for smooth streaming plugin in -bad). Out of curiosity, what is the main problem with HTML vs. XML here? That closing tags don't get automatically inserted when missing? Anything else?
(In reply to comment #9) > (From update of attachment 245286 [details] [review]) > Marking as needs-work as per comment #8. > > Thanks for working on this! > > It would be nice to have a simple xml/html parser in libgstpbutils or so, would > come in handy elsewhere too (e.g. for smooth streaming plugin in -bad). > > Out of curiosity, what is the main problem with HTML vs. XML here? That closing > tags don't get automatically inserted when missing? Anything else? 1. html tag is not case-sensitive 2. attribute value may not be quoted
(In reply to comment #10) > (In reply to comment #9) > > Out of curiosity, what is the main problem with HTML vs. XML here? That closing > > tags don't get automatically inserted when missing? Anything else? > > 1. html tag is not case-sensitive > 2. attribute value may not be quoted 3. more html entities. please see https://git.gnome.org/browse/libxml2/tree/HTMLparser.c?id=v2.9.1#n1628
Created attachment 245440 [details] [review] reimplement samiparse without libxml dependency
Created attachment 245441 [details] [review] sami parser testcases add more sami parser testcases
Thanks for the patches! Pushed to git master now, along with a memory leak fix, and removal of the now-obsolete libxml checks in configure.ac: I'll cherry-pick it into the 1.0 branch as well, shouldn't affect anything else after all. commit 79337c1ad2f23871dd80503555dac7a77930c9aa Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Tue Jul 16 18:42:19 2013 +0100 configure: remove obsolete libxml checks https://bugzilla.gnome.org/show_bug.cgi?id=693056 commit 3eea81e4a2b32bf0188fc31b3d60322494f23c32 Author: Tim-Philipp Müller <tim.muller@collabora.co.uk> Date: Tue Jul 16 18:30:50 2013 +0100 subparse: don't leak parse context for sami and qttext In gst_sub_parse_dispose() parser_type will be UNKNOWN, so these deinit calls were never executed. And we should clean up the parser state in the downwards state change anyway. commit 6cea51c9623e6df6da3b3a9918ca250cd3d81ae1 Author: Young-Ho Cha <ganadist@gmail.com> Date: Tue May 28 16:56:28 2013 +0900 tests: update sami parser testcases Remove libxml dependency for sami parser and add more testcases. https://bugzilla.gnome.org/show_bug.cgi?id=693056 commit f597efe24b010bf50ea7f7ebcae92fb293df3167 Author: Young-Ho Cha <ganadist@gmail.com> Date: Sat May 25 17:10:14 2013 +0900 subparse: remove libxml dependency for sami parser and re-enable sami parser To celebrate 2013.gnome.asia, updated sami parser for gstreamer 1.x. :D Remove conditional block for check libxml usage and implement a simple html markup parser for the sami parser. https://bugzilla.gnome.org/show_bug.cgi?id=693056