After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 642086 - MAJOR performance regression with polygon drawing.
MAJOR performance regression with polygon drawing.
Status: RESOLVED FIXED
Product: libchamplain
Classification: Core
Component: polygons
0.9.x
Other Linux
: Normal normal
: ---
Assigned To: libchamplain-maint
libchamplain-maint
Depends on:
Blocks:
 
 
Reported: 2011-02-11 05:26 UTC by Robert Bruce Park
Modified: 2011-04-03 15:44 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Python demo benchmarking polygon node addition (619 bytes, text/x-python)
2011-04-03 15:44 UTC, Jiri Techet
Details

Description Robert Bruce Park 2011-02-11 05:26:38 UTC
Alright, so I did some profiling like I said I would in this mailing list post:

http://mail.gnome.org/archives/libchamplain-list/2011-February/msg00007.html

Unfortunately I've discovered that Python's cProfile can't actually see into C code, so there's a big black box of slow called 'invoke'. Here's some pstats output anyway:

         3588000 function calls (3575175 primitive calls) in 1317.030 CPU seconds

   Ordered by: cumulative time, internal time
   List reduced from 512 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 1317.030 1317.030 <string>:1(<module>)
        1    0.000    0.000 1317.030 1317.030 {execfile}
        1    0.001    0.001 1317.030 1317.030 gottengeography:3(<module>)
        1    0.000    0.000 1316.751 1316.751 gg/app.py:837(main)
        1    0.000    0.000 1316.750 1316.750 gg/app.py:444(open_files)
        1    0.000    0.000 1315.519 1315.519 gg/app.py:482(load_gpx_from_file)
        1    0.000    0.000 1315.079 1315.079 gg/files.py:158(__init__)
        1    0.861    0.861 1315.076 1315.076 {built-in method ParseFile}
   117868    0.098    0.000 1313.278    0.011 /builddir/build/BUILD/Python-2.7/Modules/pyexpat.c:618(EndElement)
   117868    1.282    0.000 1313.180    0.011 gg/files.py:228(element_end)
137406/131836    0.275    0.000 1309.675    0.010 /opt/gnome/lib64/python2.7/site-packages/gi/types.py:43(function)
137420/131849 1309.352    0.010 1309.420    0.010 {method 'invoke' of 'gi.CallableInfo' objects}
    39287    1.240    0.000 1147.829    0.029 gg/app.py:646(append_point)
     5225    0.388    0.000  162.799    0.031 gg/app.py:624(gpx_pulse)
        1    0.001    0.001    0.828    0.828 gg/app.py:821(redraw_interface)
   393557    0.198    0.000    0.560    0.000 /builddir/build/BUILD/Python-2.7/Modules/pyexpat.c:441(CharacterData)
    39289    0.439    0.000    0.439    0.000 {map}
        1    0.291    0.291    0.437    0.437 gg/utils.py:161(lookup_geoname)
        1    0.000    0.000    0.403    0.403 gg/app.py:460(load_img_from_file)
        1    0.000    0.000    0.403    0.403 gg/files.py:44(read)

So, this is indicating that "append_point" (mine, not yours) is the slowest part of the GPX load process, taking up 1147 out of 1315 possible seconds. Here's that method in it's entirety:

    def append_point(self, latitude, longitude):
        marker = Champlain.Marker()
        marker.set_position(latitude, longitude)
        self.polygons[-1].add_marker(marker)
        return marker

So yeah, something pretty serious is broken here. I'd like to point out that 1300s is nearly TWENTY TWO MINUTES, where previously this exact same file could be loaded in 5.79 *seconds*.
Comment 1 Jiri Techet 2011-02-11 12:57:23 UTC
I think I know what's going on here: every time you insert a marker to the polygon or change its coordinates (you do both), the polygon is redrawn completely. This is what you normally want - inserting or or changing position of a point should be reflected by the layer. However, when just adding many points this is too inefficient - in your code the whole polygon is redrawn twice every time you add a new point. To speed things up, first hide the the polygon, add your points and then display the polygon again:


champlain_marker_layer_set_path_visible (layer, FALSE);

//add all the points of the polygon here

champlain_marker_layer_set_path_visible (layer, TRUE);


Please let me know if it helped.
Comment 2 Robert Bruce Park 2011-02-11 15:16:17 UTC
That does speed things up significantly, but it's still 4x slower than before. Perhaps you could make it so that it only redraws the segment of the path that actually changed? Seems like that would be a sensible optimization. That must have been how it was behaving before. It used to be that you could actually watch the polygon draw while it was loading.

Also, I'm setting the position of the marker before adding it to the layer, so the layer shouldn't be redrawing twice because of that. I could see that happening if I was adding first and then setting the position.
Comment 3 Robert Bruce Park 2011-02-11 15:23:35 UTC
I should clarify that with path's set to invisible, it still takes *six*minutes* to load my file that ought to be loading in 5.79 seconds. The "4x slower" figure was from a smaller test file that obviously didn't tell the whole story.
Comment 4 Jiri Techet 2011-02-11 20:35:55 UTC
Got it.

The slow method is clutter_container_add_actor() of the ClutterGroup class. I had a look into the implementation and it sorts the list of actors it contains by depth every time a new actor is added - this leads to quadratic complexity and bad performance. In libchamplain all the actors have the same depth so we don't need the sorting at all. Previously the nodes of the polygon weren't actors so this problem didn't exist (but it existed for ordinary markers).

If you are in a hacky mood, you can try to get sources of clutter, go to clutter-group.c and in clutter_group_real_sort_depth_order() uncomment the line:

priv->children = g_list_sort (priv->children, sort_by_depth);

Rebuild clutter, sudo ldconfig, rebuild libchamplain. In my case it really fixed the problem.

Since I know where the problem is and that it's not so hard to fix (one way is to have our own modified ClutterGroup or, better, if it's possible to override the single sorting function of ClutterGroup in our own group class) I'll concentrate now on the tile fading problem and leave this issue after the 0.9.0 release which I want to have out as soon as possible.
Comment 5 Jiri Techet 2011-02-11 20:40:38 UTC
(In reply to comment #2)
> That does speed things up significantly, but it's still 4x slower than before.
> Perhaps you could make it so that it only redraws the segment of the path that
> actually changed? Seems like that would be a sensible optimization. That must
> have been how it was behaving before. It used to be that you could actually
> watch the polygon draw while it was loading.

The implementation has always been like this, I haven't changed anything in this respect. What you suggest makes sense, it's just a bit more complicated because you should also erase the old line segments when you move one node and then redraw the new line segments. The only thing is that nobody has implemented it yet.

> 
> Also, I'm setting the position of the marker before adding it to the layer, so
> the layer shouldn't be redrawing twice because of that. I could see that
> happening if I was adding first and then setting the position.

Yes, you're right.
Comment 6 Robert Bruce Park 2011-02-12 01:35:05 UTC
Alright, I commented out that line, recompiled as you said, and now the smaller of my two demo files loads in 0.19s with path rendering off, and still is 2.5s with path drawing on. So it's actually 2x slower with rendering off, and no change with rendering on. Maybe I did something wrong? How can I check that the installed library has the change?

    $ git diff
    diff --git a/clutter/clutter-group.c b/clutter/clutter-group.c
    index d6c9897..edf3a7f 100644
    --- a/clutter/clutter-group.c
    +++ b/clutter/clutter-group.c
    @@ -253,7 +253,7 @@ clutter_group_real_sort_depth_order (ClutterContainer     *container)
     {
       ClutterGroupPrivate *priv = CLUTTER_GROUP (container)->priv;
     
    -  priv->children = g_list_sort (priv->children, sort_by_depth);
    +  //priv->children = g_list_sort (priv->children, sort_by_depth);
     
       clutter_actor_queue_redraw (CLUTTER_ACTOR (container));
     }

Your message said to uncomment this line, but I think you made a typo because it was not commented when I found it, and logically it seems like commenting this line makes more sense. So I commented it, then ran 'jhbuild buildone clutter', then 'sudo ldconfig', then 'jhbuild buildone libchamplain', all commands indicating a successful compile/install. 

Also, the larger demo file now crashes approx 30s into loading. So I can't even time the larger test file. But the small test file performs the same or worse depending on the drawing. The message I get says:

drmRadeonCmdBuffer: -22. Kernel failed to parse or rejected command stream. See dmesg for more info.

and in dmesg:

[66365.964386] [drm:radeon_cs_parser_init] *ERROR* cs IB too big: 16432
[66365.964389] [drm:radeon_cs_ioctl] *ERROR* Failed to initialize parser !
[66760.343675] [drm:radeon_cs_parser_init] *ERROR* cs IB too big: 16402
[66760.343678] [drm:radeon_cs_ioctl] *ERROR* Failed to initialize parser !

So, whatever's going on there, I don't understand it, but your suggestion didn't speed anything up, only made it crash.

Anyway, I understand your desire to get a 0.9 shipped ASAP, definitely work on that. It's incredibly important to the success of my app that 0.10 ships with Fedora 15 and Ubuntu 11.04, so definitely get a release out the door, but this performance hit has basically destroyed my application.
Comment 7 Jiri Techet 2011-02-12 11:16:49 UTC
Please test with the latest version from master - I made our own version of ClutterGroup without sorting and use it for inserting markers. This speeds up things significantly on my machine. Please let me know the performance looks in comparison with 0.8 - if it's still significantly worse, then I'll create a separate layer for paths and will do things in a slightly different way. I was also wondering what the number of points you use for your test is.
Comment 8 Robert Bruce Park 2011-02-13 02:42:59 UTC
With path rendering enabled, my small demo file is down from 2.5s to 0.88s. An improvement, but still a far cry from the 0.02s of before. The big one is down from 1300s to 758.12s. Not even close to the 6s that libchamplain was once capable of.

I'm not sure what else could be slowing you down, but I would really appreciate the return of append_point. 

My demo file contains 39,287 track points. Each track point is represented by one ChamplainMarker, but this particular file has two track segments (so two ChamplainMarkerLayers). The distribution between layers is something like 80/20.



One thing I'd like to point out is that in Gtk, no drawing occurs to the screen at all until a method finishes executing, unless you use some magic like this within the method:

        while Gtk.events_pending():
            Gtk.main_iteration()

This is a feature; it allows you to change the visibility/sensitivity of various widgets and have those changes applied to the screen atomically, so the user never sees widgets in an inconsistent state. The downside is that if you have a long-running method, the whole app looks hung, thus the above magic to force the app to remain interactive even while something is processing.

My GPX parser uses exactly the above magic, but only does it once every 0.2 seconds for performance reasons. So, I create and add a few thousand ChamplainMarkers in between each call to Gtk.main_iteration(), and libchamplain is doing a full polygon redraw for every single point added, even though the drawing isn't visible to the user because I haven't told Gtk to update the display yet).

I suspect that the old API took this kind of thing into account, even though I was adding points directly to a visible polygon, libchamplain wasn't actually doing any "drawing" until Gtk.main_iteration() iterated. 

Jiri, you mentioned that appending the Markers to the MarkerLayer with path rendering enabled meant that it was completely redrawing the entire polygon for each point appended. Considering that nothing that you "draw" is visible until I call Gtk.main_iteration _anyway_, is there some way that you can defer all the drawing to not happen at all until Gtk.main_iteration() is called?

I'm just having a hard time grasping what could have slowed it down *so* much, and why it was so fast before. Surely it wasn't redrawing the entire polygon for every single point appended. Perhaps there was some subtlety to the old code such that it knew to only draw when the drawing was actually going to make it to the screen. 

Thanks for this speedup anyways, I'll take it, but there's still a ways to go.
Comment 9 Robert Bruce Park 2011-02-14 17:16:51 UTC
I've tested this further today and it turns out that if I wait until after the polygons are finished loading in order to display them, it actually only takes 21s to load the 4.4MB file. I'm impressed! I hadn't even thought to test that before because I insist that it be possible to watch the paths draw as they load.

This means that almost 100% of the slowdown is just due to polygon redrawing. If you could make the drawing code defer to the Gtk mainloop, then the performance will be within 3x of what it was before the API breakage (that is, 21s is three times slower than 6s, instead of being 216x slower at 1300s).

I betcha that if you brought back append_point, it would get us down into the neighborhood of 10-15 seconds ;-)
Comment 10 Robert Bruce Park 2011-02-14 17:32:26 UTC
Oh, and here's the profiling data with path rendering turned off:

$ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) 
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pstats
>>> p = pstats.Stats('profile')
>>> p.strip_dirs().sort_stats('cum').print_stats(20)
Mon Feb 14 10:19:53 2011    profile

         3279136 function calls (3276681 primitive calls) in 25.233 CPU seconds

   Ordered by: cumulative time
   List reduced from 510 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   25.233   25.233 <string>:1(<module>)
        1    0.000    0.000   25.233   25.233 {execfile}
        1    0.008    0.008   25.233   25.233 gottengeography:3(<module>)
        1    0.000    0.000   24.972   24.972 app.py:842(main)
        1    0.000    0.000   24.972   24.972 app.py:451(open_files)
        1    0.000    0.000   23.733   23.733 app.py:490(load_gpx_from_file)
        1    0.000    0.000   22.082   22.082 files.py:158(__init__)
        1    0.587    0.587   22.079   22.079 {built-in method ParseFile}
   117868    0.065    0.000   20.743    0.000 pyexpat.c:618(EndElement)
80207/79825    0.102    0.000   20.714    0.000 types.py:43(function)
   117868    0.491    0.000   20.678    0.000 files.py:223(element_end)
80221/79838   20.601    0.000   20.615    0.000 {method 'invoke' of 'gi.CallableInfo' objects}
    39287    0.826    0.000   16.618    0.000 utils.py:132(append_point)
       94    0.006    0.000    2.851    0.030 app.py:643(gpx_pulse)
        1    0.001    0.001    0.832    0.832 app.py:827(redraw_interface)
   393557    0.161    0.000    0.447    0.000 pyexpat.c:441(CharacterData)
        1    0.292    0.292    0.441    0.441 utils.py:180(lookup_geoname)
        1    0.000    0.000    0.406    0.406 app.py:468(load_img_from_file)
        1    0.000    0.000    0.406    0.406 files.py:44(read)
        1    0.405    0.405    0.405    0.405 metadata.py:89(read)

What this is saying is that the app ran for 25s, 22s of which were spent in the GPX parser, 16s of that being spent in my python append_point method!

This is a clear indication that if you brought back champlain_polygon_append_point, it would quadruple the speed of my application. 

Pretty please? I would be really happy if it just took lat,lon and then returned a ChamplainMarker. In the documentation you could just say it's a faster way to add invisible markers when your intention is to draw a polygon, and that if you want ChamplainLabels or ChamplainPoints instead, you should create them yourself and use add_marker instead.
Comment 11 Jiri Techet 2011-02-14 23:54:44 UTC
Hi Robert, please see the email I sent to the mailing list just now. The performance problem was not caused by calling two python functions instead of one - it was called by slow ClutterActor creation. Please test and let me know about the result - if there really is a performance problem because of calling two python function, I can create an extra call for you in the interface. But I seriously doubt it will make a measurable difference.
Comment 12 Robert Bruce Park 2011-02-15 01:11:32 UTC
Python is funny. I once shaved 0.2s off the loading by getting rid of a dictionary instantiation for each point (before, I was storing the points as a dictionary that contained a ChamplainPoint as one value, and then the elevation data as a second value. When I learned that nothing would stop me from simply adding an elevation attribute onto ChamplainPoint, I did that and got rid of the dictionary, and it sped things up). So I definitely think you are underestimating how slow python actually is ;-)

Anyways, like I said on the mailing list, we're now down to 9.3s with path rendering turned off. Fairly impressive, this is now a mere 60% slowdown, instead of 200%. Just gotta do something to speed it up while path rendering is enabled ;-)
Comment 13 Robert Bruce Park 2011-02-17 03:44:27 UTC
Hey Jiri, I'm closing this one because you're awesome.
Comment 14 Jiri Techet 2011-02-17 22:55:44 UTC
Reopening. I'd still like to make some experiments to see if bulk addition of points improves performance.
Comment 15 Robert Bruce Park 2011-02-18 02:30:04 UTC
Ok. Here's an updated profile (with rendering turned on) just for kicks:

 $ python
Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) 
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pstats
>>> p = pstats.Stats('profile')
>>> p.sort_stats('cum').print_stats(20)
Thu Feb 17 19:21:40 2011    profile

         3356826 function calls (3354642 primitive calls) in 13.375 CPU seconds

   Ordered by: cumulative time
   List reduced from 513 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   13.375   13.375 <string>:1(<module>)
        1    0.000    0.000   13.375   13.375 {execfile}
        1    0.008    0.008   13.375   13.375 ./gottengeography:3(<module>)
        1    0.000    0.000   13.117   13.117 ./gg/app.py:816(main)
        1    0.000    0.000   13.116   13.116 ./gg/app.py:435(open_files)
        1    0.000    0.000   11.860   11.860 ./gg/app.py:474(load_gpx_from_file)
        1    0.000    0.000   11.393   11.393 ./gg/files.py:158(__init__)
        1    0.568    0.568   11.390   11.390 {built-in method ParseFile}
   117868    0.065    0.000   10.078    0.000 /builddir/build/BUILD/Python-2.7/Modules/pyexpat.c:618(EndElement)
   117868    0.468    0.000   10.012    0.000 ./gg/files.py:225(element_end)
79841/79613    9.190    0.000    9.201    0.000 {method 'invoke' of 'gi.CallableInfo' objects}
40540/40313    0.055    0.000    8.350    0.000 /opt/gnome/lib64/python2.7/site-packages/gi/types.py:43(function)
    39287    0.359    0.000    7.538    0.000 ./gg/utils.py:129(append_point)
       50    0.003    0.000    1.302    0.026 ./gg/app.py:617(gpx_pulse)
    39301    0.106    0.000    1.044    0.000 /opt/gnome/lib64/python2.7/site-packages/gi/types.py:64(constructor)
        1    0.001    0.001    0.826    0.826 ./gg/app.py:801(redraw_interface)
   393557    0.161    0.000    0.450    0.000 /builddir/build/BUILD/Python-2.7/Modules/pyexpat.c:441(CharacterData)
        1    0.290    0.290    0.435    0.435 ./gg/utils.py:176(lookup_geoname)
        1    0.000    0.000    0.403    0.403 ./gg/app.py:452(load_img_from_file)
        1    0.000    0.000    0.403    0.403 ./gg/files.py:44(read)


So, 9.2 out of 13.3 seconds is spent in C code somewhere. Also, 7.5s spent in my python append_point method. I betcha if you brought back the C version of that it would slash my load times nearly in half ;-)
Comment 16 Jiri Techet 2011-04-03 15:43:18 UTC
Don't worry about further benchmarking - I did it myself and found that coordinates were added into the internal list of PathLayer in a stupid way. I've fixed this. With the attached demo I can add about 100000 points in about 7s which I find acceptable. I've also tested what happens when I introduce the new methods for node addition but there's no performance improvement so I won't introduce them to libchamplain.
Comment 17 Jiri Techet 2011-04-03 15:44:40 UTC
Created attachment 185036 [details]
Python demo benchmarking polygon node addition