Bug 784220 – [review] add NMDedupMultiIndex for platform and NMIP4Config [th/dedup-multi-bgo784220]

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 784220 - [review] add NMDedupMultiIndex for platform and NMIP4Config [th/dedup-multi-bgo784220]


Summary:	[review] add NMDedupMultiIndex for platform and NMIP4Config [th/dedup-multi-b...


Status:	RESOLVED FIXED

Product:	NetworkManager
Classification:	Platform
Component:	general
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	NetworkManager maintainer(s)
QA Contact:	NetworkManager maintainer(s)

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2017-06-26 15:19 UTC by Thomas Haller
Modified:	2017-07-07 11:33 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Thomas Haller 2017-06-26 15:19:48 UTC

change the implementation of the lookup index for platform, by introducing NMDedupMultiIndex.

The major change is that this new index preserves the order of the tracked objects. We don't make use of it yet, but will require it for fixing caching or routes.

Also, it gives way to treat NMPObject instances as immutable, ref-counted and shared. Later, NMIP4Config, NMIP6Config, NMRouteManager, NMDefaultRouteManager and NMPlatform should all share their NMPObject instances. This hopefully allows us to save some memory. The goal is 600k routes (which amounts to ~120MB when each route consumes 200 bytes).

Also, change NMIP4Config to use NMDedupMultiIndex for tracking IPv4 routes. Previously, subtrack/intersect/merge would all be O(n^2), now they are O(n) as adding one routes is O(1) thanks to the hash table.


More to come later:
  - drop fake platform (it's just too cumbersome to maintain)
  - expose NMPLookup as public API from NMPlatform. We don't want to ever copy 
    objects. Just give the user the NMDedupMultiHeadEntry instance, and it can
    iterate of the cache themself.
  - drop NMMultiIndex. It's unused now
  - rework NMIP4Config, NMIP6Config to use NMDedupMultiIndex for routes and 
    addresses.
  - maybe rework NMRouteManager to use it too. But while fixing caching of 
    routes, maybe route manger will go away entirely.

Comment 1 Thomas Haller 2017-06-27 10:20:09 UTC

did some testing with IPv4 routes...


running

  NMTST_DEBUG=log-level=info ./src/platform/tests/monitor

- memory consumption without any routes:
  - old and new roughly 8036K (with considerable variation)
    
 - incrementally add 100k routes while monitor is running:
  - previously:
    31828K, 3.27sec
  - new:
    73684K, 3.45sec
- starting monitor with 500k routes
  - previously:
    147M, 3.61sec
  - new:
    322M, 1.88sec

so, the average memory consumption per IPv4 route is
  - previously:
    243 to 291 bytes
  - new
    674 bytes
    

So, runtime seems good in both cases. Memory consumption of the new implementation is worse, but I think acceptable.


Note that the new implementation *gains* functionality: preserving the order of routes. Supporting that to the previous implementation would have a significant impact too (either runtime or memory).

"monitor" is not a realistic test for NetworkManager. In this case, no NMPObject could be shared. On the other hand, I hope we could extend NMIP4Config and NMRouteManager to reuse the same objects.
Also, nobody was actually accessing the platform cache. The previous implementation is bound to always copy the data, while the new one can expose an iterable NMDedupMultiHeadEntry.

Comment 2 Thomas Haller 2017-07-03 14:35:48 UTC

another test... 500.000 IPv4 routes and running

NMTST_DEBUG=log-level=info perf stat ./src/platform/tests/monitor --no-persist



master:

       3683.835595      task-clock:u (msec)       #    1.000 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
            63,202      page-faults:u             #    0.017 M/sec
    10,984,259,345      cycles:u                  #    2.982 GHz
    23,049,992,065      instructions:u            #    2.10  insn per cycle
     5,330,484,183      branches:u                # 1446.993 M/sec
        19,453,699      branch-misses:u           #    0.36% of all branches

       3.684238575 seconds time elapsed



th/dedup-multi-bgo784220:

       2837.793523      task-clock:u (msec)       #    1.000 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
           114,894      page-faults:u             #    0.040 M/sec
     7,975,703,835      cycles:u                  #    2.811 GHz
    12,212,923,768      instructions:u            #    1.53  insn per cycle
     2,742,011,799      branches:u                #  966.248 M/sec
        25,600,413      branch-misses:u           #    0.93% of all branches

       2.838242202 seconds time elapsed


so, the new code is faster, but consumes more then double the memory (318M vs 147M) and has (a bit less then) double the page-faults.

Comment 3 Thomas Haller 2017-07-04 09:35:56 UTC

ok, one way to reduce the number of memory is to drop maintaining indexes that can be evaluated differently. Commit "platform: reduce number of route indexes" does that, and it changes memory use of:

  $ NMTST_DEBUG=log-level=info time -v ./src/platform/tests/monitor --no-persist

from

  master:                   154944k   ~300 bytes/route
  th/dedup-multi-bgo784220: 292608k   ~582 bytes/route
  before:                   350577k   ~701 bytes/route

that works well, because all those 500.000 routes are non-default, IPv4 routes, so the patch effectively gets rid of two indexes out of 6.


and the runtime is also better:

       2658.260145      task-clock:u (msec)       #    1.000 CPUs utilized          
                 0      context-switches:u        #    0.000 K/sec                  
                 0      cpu-migrations:u          #    0.000 K/sec                  
           100,317      page-faults:u             #    0.038 M/sec                  
     7,363,068,474      cycles:u                  #    2.770 GHz                    
     9,771,469,867      instructions:u            #    1.33  insn per cycle         
     2,185,055,704      branches:u                #  821.987 M/sec                  
        17,150,058      branch-misses:u           #    0.78% of all branches        

       2.659357458 seconds time elapsed

Comment 4 Thomas Haller 2017-07-04 10:02:08 UTC

ok, dropped another index:

 > platform: drop separate index for visible objects

Now we are at 263548k (~539 bytes/route)

Comment 5 Beniamino Galvani 2017-07-05 13:42:28 UTC

Pushed a commit to fix link enumeration.

I haven't reviewed every single commit of the branch, but since we are
early in the development cycle and the results look promising, I don't
mind if we merge the branch as is and fix later any issue that we find.

Comment 6 Thomas Haller 2017-07-07 11:33:03 UTC

merged to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=22cd73d3aa89b8db3f5ecf32232cd09ca329ed93