GNOME Bugzilla – Bug 654530
Improve CPU usage of DIDL-Lite parsing
Last modified: 2021-05-17 17:04:33 UTC
DIDL-Lite parsing seems to be unnecessarily heavyweight: > I can't point at a particular issue. All I can say is that parsing > DIDL-Lite takes a considerable amount of CPU cycles. It might help to > use a SAX based parser instead of building the DOM, but that would be a > major rewrite of gupnp-av and I am not sure if it's worth the effort.
Looking at the code, the problem may be due to the implementation of GUPnPDIDLLiteObject and its sub-classes. It may be that it is the GUPnPDIDLLiteObject methods used to retrieve object values that are slow, rather than the parsing code itself. GUPnPDIDLLiteObjects do not store separate member variables for each of their values. Instead they store this information in an in-memory xml tree. Each time the user tries to read a property from the GUPnPDIDLLiteObject, a linear search of the XML tree's children or its attributes must be performed, resulting in lots of strcmps. If the item contains lots of properties, this is going to be slow, particularly if you try to read them all, which I guess would be a typical use case. The penalty is heaviest if a given property does not exist, which again, is probably quite typical. It would probably be more efficient to extract all the values when the object is parsed and store them as member variables in the object. The parser itself would be slightly slower, but any function called to retrieve item values would execute in constant time, i.e., the "object-available" callback would be much faster. Setting values would be quicker as well. This analysis is based on looking at the code. I have no empirical evidence to back up my claims. Also as Jens mentions, this would be a big change.
Oh. I thought I added my benchmark to this bug. An awful lot of time (~70% IIRC) really is spent in the XML parsing. I'll try to dig up the sample code I used and attach it here.
Mark, Thanks so much for your analysis. When I first wrote this code, I must admit that I was focused on getting the memory footprint low while still using DOM. Unfortunately there is many string props involved and keeping them in memory would most probably mean increasing our memory footprint significantly. But as Jens pointed out, retreival of props isn't the biggest performance culprit here.
I was thinking that property retrieval would slow down the parser if you were to register an "object-available" callback that retrieves lots of properties from the objects it is passed. Since such callbacks are run before gupnp_didl_lite_parser_parse_didl completes, I thought that they might be the bottleneck. But it seems from your and Jens' comments that this is not the case.
That would indeed be an additional penalty on top.
Created attachment 219771 [details] Callgrind output
Created attachment 219772 [details] Tool to Split large DIDL Used to split a large didl response from a i.e. a search(upnp:class derivedFrom "object.item") into small files.
Created attachment 219773 [details] Tool that parses single didl snippets Take the didl snippet files and just run gupnp_didl_lite_parser_parse on them
-- GitLab Migration Automatic Message -- This bug has been migrated to GNOME's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gupnp-av/-/issues/1.