GNOME Bugzilla – Bug 169345
Use a mmap'able cache for fontconfig
Last modified: 2006-05-07 20:54:34 UTC
On the startup of every program, fontconfig allocates about 100 kb of data. This data should be read from a memory mappable cache on disk. This way, the data is shared between processes. Also, it allows more expensive techniques to optimize the data storage. A solution to this bounty will: - Create a program that generates an on-disk cache. - Modify fontconfig to read from this file, allowing a fallback path with the existing code. This bug is part of the Integrated Collaborative Desktop Bounty Hunt. For more information on prizes, contest rules, and other bounty tasks, visit: http://www.gnome.org/bounties/ If you would like to start working on this bounty, please create a bugzilla account and append your intention to work on this bounty to this bug. If multiple people declare their intentions to work on a task, we encourage you to join forces and work together. Please do not close this bug. The contest organizers will mark this bug as FIXED when the prize is claimed.
See http://www.gnome.org/bounties/Memory.html#169345
I will claim this bounty, and post updates on my progress here.
I am cooking up a patch which replaces FcPattern*'s by FcPatternIndexes into an array stored in fcpat.c. This will allow FcPatterns and references thereto to be mmaped.
A comment: one could mmap the data structures to disk as is, complete with pointers. But that is a fragile approach; I've had other software (PolyML, in fact) fail on me after a kernel upgrade, because the address it tried to load the mmap image at was no longer valid. I'm working on an approach which converts linked pointer structures to array-indexed structures.
Yeah, doing the pointers method could get very ugly. I like the array indexed idea better.
My patch (against fontconfig CVS HEAD) currently indexes the FcPatterns and FcPatternElts. That's fairly straightforward, although it does involve an API change -- we now return FcPatternIndexes instead of FcPatterns -- and some slight perf hit, in that fc-list runs on my system in 0.118s instead of 0.110s on average. But I also need to index the FcValue thingies.
Argh, can we please avoid API changes? A metric ton of stuff uses fontconfig, we don't want to break the API just for this. If you're returning indexes can't you just return the address of the indexed array element?
It's hard for me to imagine how to safely return the address of an indexed array element, because we need to be able to realloc the array (which might, of course, grow). So I can return pointers, but they might go become invalid in the future, which seems like very poor API design indeed. That's not quite the right way to do things, is it? I'm open to other suggestions.
By the way, the pointers are opaque anyhow. So it's a change of type, but that's all: instead of saying, FcPattern * fp = foo; you'd just instead say FcPatternIndex fp = foo; I agree that this sucks, but I really can't see any other way to enable mmaping.
So it breaks API but not ABI? There may be a way to patch it up then. Why might the array be reallocd? I'm missing something I guess, I thought the point was to avoid mallocing the array entirely
You can store an int in a pointer, so just do a cast. I assume the pointer is opaque anyways, so that doesn't make a difference. Changing a pointer to an int probably breaks abi on 64 bit boxes. Also, I don't get the realloc thing either. This should be always avoid allocations...
We can allocate all the FcPattern we need in the initial configuration, but developers will subsequently use FcPatternCreate() to get more FcPatterns. In fact, these dynamically-allocated FcPatterns won't be mmapable anyway, because they are altered during the course of the program's execution. And if we have pointers to any type of FcPatterns, then we can't mmap them safely. Probably I'll use positive indices for the FcPatterns existing in the program's initial configuration and negative indices for user FcPatterns. The user FcPatterns will need to be realloced at times; the initial FcPatterns won't need to be realloced. It ought to be safe to cast the int to a pointer, it's just kind of icky. But it's probably better than breaking abi compatibility on 64-bit boxes. The pointer is definitely opaque.
Are these things that need to be malloced used by "normal" code (like a gtk+ hello world)?
Normal code does not explicitly call the fontconfig code (so the broken ABIs wouldn't affect normal code, only library code, in the general case). However, normal code depends on the fontconfig code somehow. I don't know how the normal code reaches the fontconfig code. It still shouldn't be a tragedy if there are one or two things that need to be malloced and there are 50 things that are allocated by fontconfig on startup (and hence mmapable). If it turns out that it's always the same thing that programs tend to malloc, too, we can cache that, but let's worry about that later.
By the way, I collected some stats for the fc-list sample program. I imagine that programs use, on average, less fonts than fc-list. FcPatterns used after FcInit(): 114 After the main loop of fc-list, we still use the same number of FcPatterns. Unfortunately, the FcPattern and FcPatternElts themselves currently only account for 3k of memory storage. Shared strings account for another 10k. These are mmapable, but I think that FcValues are going to be a bigger win (since they actually store all the pattern data).
Note that https://bugs.freedesktop.org/show_bug.cgi?id=2659 contains a patch which reduces the number of new strings being created in the pattern code drastically.
Patrick, have you had any luck mmaping the reset of the data.
I've been travelling a lot lately, but I'll work on this some more in the near future.
I am interested having this issue solved too; I will investigate it this week.
Created attachment 47301 [details] [review] Patch which makes most pointers optional inside fontconfig This patch includes code which will let fontconfig use array indices for FcPattern, FcPatternElt and FcValueList. This is almost all of the infrastructure needed for making fontconfig use mmap(); the remaining work is fairly straightforward. I believe this patch is reasonably stable.
I saw a previous iteration of Patrick Lam's patch on fontconfig mailing list: http://lists.freedesktop.org/archives/fontconfig/2005-March/001253.html Following discussion was interesting too... Any recent news from Keith Packard & al. concerning this? I personally do not like the approach taken very much though (no flame intended: giving the size of the patch, I realize Patrick's efforts were not negligible). I just do not see why we have to break the API at all; portable-safe methods to mmap dynamically allocated stuctures exist and are used on a regular basis: usually, one first needs to correctly specify a mmap'ed pages format, then performs some extra work during the first load after system boot to remap pointers cleanly. It is the logic equivalent of using indiced arrays during the saving and loading phase, but nothing from subsequent code has to be modified ( which implies that the overall recoding effort become significantly less important).
Sylvain: Your approach would work too; it didn't occur to me at the time (and Google didn't remind me). I have had bad experiences, though, with software that mmaps to a fixed address. Do you have a reference to some example which implements pointer remapping? Does it work when different computers share the same fontconfig mmap file across the network? In my approach (and any I can imagine) I would need different mmap files for different architectures, but at least machines using the same architecture ought to be able to share mmap files.
> I have had bad experiences, though, with software that mmaps to a fixed > address. MMAP_FIXED mmaps are not full-proof, I agree, as they can very well generate a MAP_FAILED error, and this would need to be handled. On modern platforms and systems though, the usual trick to map the first instance freely then have the following ones at same fixed virtual address eliminates most mapping rejections. > Do you have a reference to some example which implements pointer remapping? Mmh... I have plenty closed-source code that does this, but the only open-source project I can think of right now is emacs; given the complexity of the beast, I am not sure it helps much. :-) > Does it work when different computers share the same fontconfig mmap file > across the network? Yes, provided you do a little more work, involving a local copy (that can very well just be a shm_open map). Of course, the per-architecture limitation holds. I have plenty of time this thursday and friday to write a proof of concept code; I will of course post a link here.
I am also planning on working on this. I will be taking a different approach than Sylvain and Patrick.
Sylvain, your approach is better. I've started modifying my patch to use your approach, and I should have something to show for it in the near future. In any case, pretty soon I ought to have a hybrid patch which has elements of both arrays and pointers, and then I'll look into backing out the other arrays too.
Created attachment 47477 [details] [review] Initial patch which mmaps fontconfig data structures This is not done yet, but it does run fc-list and fc-match correctly. I've incorporated Sylvain's technique into some of the pointers, and will continue to do so after I get all the data in and out. Run fc-mmap to get a mmapping file called 'fc' (in the current directory); the other fontconfig clients will use this file instead of opening the fonts.
Created attachment 47492 [details] [review] This patch maps all fundamental fontconfig data structures. There's still a couple of things I'd like to improve in this patch, but the attached version works for me. Comments and testing welcome. The only feature that I think needs to be added is a way to specify where the mmapping file should go (on the command line? in the config file?) and I need to figure out if I'm handling user fonts (vs. system fonts) properly.
Created attachment 47516 [details] [review] Patch as sent to the fontconfig mailing list. This patch modifies fontconfig's CVS HEAD to use an mmap()able file containing internal fontconfig data structures. The new binary fc-mmap creates this map file (in /tmp, for now[1]) and fontconfig clients use this map file instead of loading font data themselves. On my system, my modified version of fontconfig mmaps 147786 bytes into memory (instead of creating around 100k of dynamic data structures). I believe that this patch is stable; it produces no differences in behaviour as compared to a stock fontconfig under my tests. The public API modified the patch is a pair of functions, FcMmapForce() and FcMmapSave(). FcMmapForce(true) forces fontconfig to ignore any mmaping data which might exist, and FcMmapSave() is used in fc-mmap to write the data structures to disk. Unlike an earlier version of the patch, no other API modifications are necessary. The structure of the patch is as follows. 1. Loading: I include a new file, src/fcmmap.c, which contains the bulk of the logic for reading and writing the mmap file; some logic lives in src/fclang.c (to serialize a data structure private to that file). During fontconfig's initialization, it first calls FcMmapLoadObjects immediately after opening the configuration file, but before processing it, to load string constants into memory. Later on, fontconfig calls FcMmapLoadFonts to mmap the rest of the mmapped data into memory, instead of loading the data from the font files. 2. Saving: I use a three-phase algorithm to save the data to disk. In the first phase (initiated by FcMmapSave's call to FcFontSetSerialize(config)), I move all font information reachable from currentConfig.fonts to a set of static arrays (with a first relocation from the dynamic addresses to the static arrays). Next, I write the data to the candidate mmap file. The final phase is the fixup; I use mmap() to read the data back into memory, recording the addresses given to me by mmap() to, again, relocate internal pointers as appropriate. Once fixup is complete, the mmap file is ready to be loaded with no further transformations. Thanks to Sylvain Fourmanoit for hints on how to improve an earlier version of this patch. I would like this patch to be considered for inclusion in the fontconfig CVS. pat [1] I would appreciate any suggestions on where the map file should be created. It is difficult to put the map file's location in the configuration file, because the map file is needed before fontconfig loads the configuration information.
Um, oops. The attached patch has some problems due to a last-minute change. Please refer to http://plam.csail.mit.edu/~plam/tmp/fontconfig-mmap-050609.diff.
Thanks Patrick! I currently get some random segfaults using your latest patch agaist CVS head, I will try to isolate what goes wrong and post my findings here. Looking at your code, I still have a few concerns about dynamic data duplication in the mmap'ed file (are FcCharset instances being copied multiple time, or did I miss something?), as well as the whole saving procedure using a lot of dynamic memory (approx. three times the memory for a simple load)... I already spent a few hours reimplementing the same idea, with a single-pass saving algo.: I should post it here before sunday; of course, I will respect the API semantic you choosed, and probably just borrow your fc-mmap program. Sylvain.
I was getting a few segfaults with the version attached here (they were caused by trying to strlen(getenv("foo")), which can fail); the version on my webpage should not fail that way. Let me know if you still get segfaults, preferably with backtraces. I'm looking into FcCharSets; there are no explicit provisions for not duplicating them. It shouldn't be hard to do that, but the question is whether it's really a win or not. I'm not especially concerned about the memory usage of fc-mmap, since it only gets run once and then terminates.
> It shouldn't be hard to do that, but the question is whether > it's really a win or not. On my home system (completely unmodified regarding fontconfig: out-of-the-box settings used), the debug printout is reporting that FcCharSets are taking one 40% of the dynamic memory used, and manual address checking taugth me that half of them are used more than once... I do not know if it is representative, though. > I'm not especially concerned about the memory usage of fc-mmap, since it only > gets run once and then terminates. Well, I am a bit more concerned... I administer a (too small) server farm with about 200 concurent users, 30 on the average being graphic artists that would use _huge_ fonts caches (>30MB of RAM): I had to put in place ugly hacks to give access to fontconfig-lookalike stubs: shared information is the only clean way I can think of to really provide the service in this case: if I could, I would prefer not having the users use >90 MB of virtual memory to generate a mmap'ed version, even if it is just an occasional operation. Of course, your patch would already be a huge step towards making fontconfig usable for me (If I can get it to work: I currently experience non-systematic segfault on fc-list and fc-match).
Created attachment 47519 [details] [review] More efficient fc_mmap behaviour Sylvain, this version actually frees things during mmap creation; it should be helpful for you.
New version at http://plam.csail.mit.edu/%7Eplam/tmp/fontconfig-mmap-050610.diff
Thanks Patrick. This now works great here too; I now have little incentive working anymore on this, since what you pulled so far now perfectly fit my needs: I wish you best of luck for inclusion in fontconfig; seeing you last answer to Mr. Packard, I hope this solution will be choosen instead of the former.
Thanks! Let me know if you have any remaining issues.
Is this bug lying in limbo or what? Does it need to be submitted to the FDO bugzilla? Does Keith need a kick? ;)
Oh, never mind. I went digging and found it's being merged into fontconfig CVS right now, which is great to see :) Sorry for the noise.
Where did you find that it is being merged into fontconfig CVS? It is still not in fontconfig CVS.
It is on the branch fc-2_4_branch in the fontconfig CVS, and still being developed (I've changed approaches several times).
Fontconfig development release 2.3.90 (and soon, 2.3.91) has an mmapable cache. http://www.fontconfig.org/wiki/Devel
Closing this since the fix hit CVS long ago.