GNOME Bugzilla – Bug 784220
[review] add NMDedupMultiIndex for platform and NMIP4Config [th/dedup-multi-bgo784220]
Last modified: 2017-07-07 11:33:03 UTC
change the implementation of the lookup index for platform, by introducing NMDedupMultiIndex. The major change is that this new index preserves the order of the tracked objects. We don't make use of it yet, but will require it for fixing caching or routes. Also, it gives way to treat NMPObject instances as immutable, ref-counted and shared. Later, NMIP4Config, NMIP6Config, NMRouteManager, NMDefaultRouteManager and NMPlatform should all share their NMPObject instances. This hopefully allows us to save some memory. The goal is 600k routes (which amounts to ~120MB when each route consumes 200 bytes). Also, change NMIP4Config to use NMDedupMultiIndex for tracking IPv4 routes. Previously, subtrack/intersect/merge would all be O(n^2), now they are O(n) as adding one routes is O(1) thanks to the hash table. More to come later: - drop fake platform (it's just too cumbersome to maintain) - expose NMPLookup as public API from NMPlatform. We don't want to ever copy objects. Just give the user the NMDedupMultiHeadEntry instance, and it can iterate of the cache themself. - drop NMMultiIndex. It's unused now - rework NMIP4Config, NMIP6Config to use NMDedupMultiIndex for routes and addresses. - maybe rework NMRouteManager to use it too. But while fixing caching of routes, maybe route manger will go away entirely.
did some testing with IPv4 routes... running NMTST_DEBUG=log-level=info ./src/platform/tests/monitor - memory consumption without any routes: - old and new roughly 8036K (with considerable variation) - incrementally add 100k routes while monitor is running: - previously: 31828K, 3.27sec - new: 73684K, 3.45sec - starting monitor with 500k routes - previously: 147M, 3.61sec - new: 322M, 1.88sec so, the average memory consumption per IPv4 route is - previously: 243 to 291 bytes - new 674 bytes So, runtime seems good in both cases. Memory consumption of the new implementation is worse, but I think acceptable. Note that the new implementation *gains* functionality: preserving the order of routes. Supporting that to the previous implementation would have a significant impact too (either runtime or memory). "monitor" is not a realistic test for NetworkManager. In this case, no NMPObject could be shared. On the other hand, I hope we could extend NMIP4Config and NMRouteManager to reuse the same objects. Also, nobody was actually accessing the platform cache. The previous implementation is bound to always copy the data, while the new one can expose an iterable NMDedupMultiHeadEntry.
another test... 500.000 IPv4 routes and running NMTST_DEBUG=log-level=info perf stat ./src/platform/tests/monitor --no-persist master: 3683.835595 task-clock:u (msec) # 1.000 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 63,202 page-faults:u # 0.017 M/sec 10,984,259,345 cycles:u # 2.982 GHz 23,049,992,065 instructions:u # 2.10 insn per cycle 5,330,484,183 branches:u # 1446.993 M/sec 19,453,699 branch-misses:u # 0.36% of all branches 3.684238575 seconds time elapsed th/dedup-multi-bgo784220: 2837.793523 task-clock:u (msec) # 1.000 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 114,894 page-faults:u # 0.040 M/sec 7,975,703,835 cycles:u # 2.811 GHz 12,212,923,768 instructions:u # 1.53 insn per cycle 2,742,011,799 branches:u # 966.248 M/sec 25,600,413 branch-misses:u # 0.93% of all branches 2.838242202 seconds time elapsed so, the new code is faster, but consumes more then double the memory (318M vs 147M) and has (a bit less then) double the page-faults.
ok, one way to reduce the number of memory is to drop maintaining indexes that can be evaluated differently. Commit "platform: reduce number of route indexes" does that, and it changes memory use of: $ NMTST_DEBUG=log-level=info time -v ./src/platform/tests/monitor --no-persist from master: 154944k ~300 bytes/route th/dedup-multi-bgo784220: 292608k ~582 bytes/route before: 350577k ~701 bytes/route that works well, because all those 500.000 routes are non-default, IPv4 routes, so the patch effectively gets rid of two indexes out of 6. and the runtime is also better: 2658.260145 task-clock:u (msec) # 1.000 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 100,317 page-faults:u # 0.038 M/sec 7,363,068,474 cycles:u # 2.770 GHz 9,771,469,867 instructions:u # 1.33 insn per cycle 2,185,055,704 branches:u # 821.987 M/sec 17,150,058 branch-misses:u # 0.78% of all branches 2.659357458 seconds time elapsed
ok, dropped another index: > platform: drop separate index for visible objects Now we are at 263548k (~539 bytes/route)
Pushed a commit to fix link enumeration. I haven't reviewed every single commit of the branch, but since we are early in the development cycle and the results look promising, I don't mind if we merge the branch as is and fix later any issue that we find.
merged to master: https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=22cd73d3aa89b8db3f5ecf32232cd09ca329ed93