After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 169345 - Use a mmap'able cache for fontconfig
Use a mmap'able cache for fontconfig
Status: RESOLVED FIXED
Product: bounties
Classification: Infrastructure
Component: Misc
unspecified
Other Linux
: Normal normal
: ---
Assigned To: Bounty Bug List
Bounty Bug List
Depends on:
Blocks:
 
 
Reported: 2005-03-05 23:46 UTC by Ben Maurer
Modified: 2006-05-07 20:54 UTC
See Also:
GNOME target: ---
GNOME version: ---


Attachments
Patch which makes most pointers optional inside fontconfig (115.96 KB, patch)
2005-06-06 06:16 UTC, Patrick Lam
none Details | Review
Initial patch which mmaps fontconfig data structures (145.32 KB, patch)
2005-06-08 21:35 UTC, Patrick Lam
none Details | Review
This patch maps all fundamental fontconfig data structures. (153.54 KB, patch)
2005-06-09 06:01 UTC, Patrick Lam
none Details | Review
Patch as sent to the fontconfig mailing list. (62.89 KB, patch)
2005-06-10 02:48 UTC, Patrick Lam
none Details | Review
More efficient fc_mmap behaviour (70.25 KB, patch)
2005-06-10 05:46 UTC, Patrick Lam
none Details | Review

Description Ben Maurer 2005-03-05 23:46:58 UTC
On the startup of every program, fontconfig allocates about 100 kb of data.
This data should be read from a memory mappable cache on disk. This way,
the data is shared between processes. Also, it allows more expensive
techniques to optimize the data storage.

A solution to this bounty will:

	- Create a program that generates an on-disk cache.
	- Modify fontconfig to read from this file, allowing
	  a fallback path with the existing code.

This bug is part of the Integrated Collaborative Desktop Bounty Hunt.  For
more information on prizes, contest rules, and other bounty tasks, visit:
 
http://www.gnome.org/bounties/
 
If you would like to start working on this bounty, please create a bugzilla
account and append your intention to work on this bounty to this bug.  If
multiple people declare their intentions to work on a task, we encourage
you to join forces and work together.
 
Please do not close this bug.  The contest organizers will mark this bug as
FIXED when the prize is claimed.
Comment 1 Ben Maurer 2005-03-06 02:35:09 UTC
See http://www.gnome.org/bounties/Memory.html#169345
Comment 2 Matthew Whitworth 2005-03-06 09:32:05 UTC
I will claim this bounty, and post updates on my progress here.
Comment 3 Patrick Lam 2005-03-09 02:42:40 UTC
I am cooking up a patch which replaces FcPattern*'s by FcPatternIndexes into an
array stored in fcpat.c.  This will allow FcPatterns and references thereto to
be mmaped.
Comment 4 Patrick Lam 2005-03-09 18:19:42 UTC
A comment: one could mmap the data structures to disk as is, complete with
pointers.  But that is a fragile approach; I've had other software (PolyML, in
fact) fail on me after a kernel upgrade, because the address it tried to load
the mmap image at was no longer valid.  I'm working on an approach which
converts linked pointer structures to array-indexed structures.
Comment 5 Ben Maurer 2005-03-09 18:36:10 UTC
Yeah, doing the pointers method could get very ugly. I like the array indexed
idea better.
Comment 6 Patrick Lam 2005-03-10 22:36:22 UTC
My patch (against fontconfig CVS HEAD) currently indexes the FcPatterns and
FcPatternElts.  That's fairly straightforward, although it does involve an API
change -- we now return FcPatternIndexes instead of FcPatterns -- and some
slight perf hit, in that fc-list runs on my system in 0.118s instead of 0.110s
on average.  But I also need to index the FcValue thingies.
Comment 7 Mike Hearn 2005-03-10 22:48:42 UTC
Argh, can we please avoid API changes? A metric ton of stuff uses fontconfig, we
don't want to break the API just for this. If you're returning indexes can't you
just return the address of the indexed array element?
Comment 8 Patrick Lam 2005-03-11 06:13:22 UTC
It's hard for me to imagine how to safely return the address of an indexed array
element, because we need to be able to realloc the array (which might, of
course, grow).  So I can return pointers, but they might go become invalid in
the future, which seems like very poor API design indeed.  That's not quite the
right way to do things, is it?  I'm open to other suggestions.
Comment 9 Patrick Lam 2005-03-11 06:17:26 UTC
By the way, the pointers are opaque anyhow.  So it's a change of type, but
that's all: instead of saying,

FcPattern * fp = foo;

you'd just instead say

FcPatternIndex fp = foo;

I agree that this sucks, but I really can't see any other way to enable mmaping.
Comment 10 Mike Hearn 2005-03-11 09:28:21 UTC
So it breaks API but not ABI? There may be a way to patch it up then.

Why might the array be reallocd? I'm missing something I guess, I thought the
point was to avoid mallocing the array entirely
Comment 11 Ben Maurer 2005-03-11 15:45:54 UTC
You can store an int in a pointer, so just do a cast. I assume the pointer is 
opaque anyways, so that doesn't make a difference.

Changing a pointer to an int probably breaks abi on 64 bit boxes.

Also, I don't get the realloc thing either. This should be always avoid 
allocations...
Comment 12 Patrick Lam 2005-03-11 16:36:33 UTC
We can allocate all the FcPattern we need in the initial configuration, but
developers will subsequently use FcPatternCreate() to get more FcPatterns.  In
fact, these dynamically-allocated FcPatterns won't be mmapable anyway, because
they are altered during the course of the program's execution.  And if we have
pointers to any type of FcPatterns, then we can't mmap them safely.

Probably I'll use positive indices for the FcPatterns existing in the program's
initial configuration and negative indices for user FcPatterns.  The user
FcPatterns will need to be realloced at times; the initial FcPatterns won't need
to be realloced.

It ought to be safe to cast the int to a pointer, it's just kind of icky.  But
it's probably better than breaking abi compatibility on 64-bit boxes.  The
pointer is definitely opaque.
Comment 13 Ben Maurer 2005-03-11 16:44:39 UTC
Are these things that need to be malloced used by "normal" code (like a gtk+ 
hello world)?
Comment 14 Patrick Lam 2005-03-11 17:15:49 UTC
Normal code does not explicitly call the fontconfig code (so the broken ABIs
wouldn't affect normal code, only library code, in the general case).  However,
normal code depends on the fontconfig code somehow.  I don't know how the normal
code reaches the fontconfig code.

It still shouldn't be a tragedy if there are one or two things that need to be
malloced and there are 50 things that are allocated by fontconfig on startup
(and hence mmapable).  If it turns out that it's always the same thing that
programs tend to malloc, too, we can cache that, but let's worry about that later.
Comment 15 Patrick Lam 2005-03-12 04:37:51 UTC
By the way, I collected some stats for the fc-list sample program.  I imagine
that programs use, on average, less fonts than fc-list.

FcPatterns used after FcInit(): 114

After the main loop of fc-list, we still use the same number of FcPatterns.

Unfortunately, the FcPattern and FcPatternElts themselves currently only account
for 3k of memory storage.  Shared strings account for another 10k.  These are
mmapable, but I think that FcValues are going to be a bigger win (since they
actually store all the pattern data).
Comment 16 Ross Burton 2005-03-30 08:06:52 UTC
Note that https://bugs.freedesktop.org/show_bug.cgi?id=2659 contains a patch
which reduces the number of new strings being created in the pattern code
drastically.
Comment 17 Ben Maurer 2005-04-03 16:43:21 UTC
Patrick, have you had any luck mmaping the reset of the data.
Comment 18 Patrick Lam 2005-06-01 19:49:43 UTC
I've been travelling a lot lately, but I'll work on this some more in the near
future.
Comment 19 Sylvain Fourmanoit 2005-06-05 22:13:00 UTC
I am interested having this issue solved too; I will investigate it this week.
Comment 20 Patrick Lam 2005-06-06 06:16:47 UTC
Created attachment 47301 [details] [review]
Patch which makes most pointers optional inside fontconfig

This patch includes code which will let fontconfig use array indices for
FcPattern, FcPatternElt and FcValueList.  This is almost all of the
infrastructure needed for making fontconfig use mmap(); the remaining work is
fairly straightforward.  I believe this patch is reasonably stable.
Comment 21 Sylvain Fourmanoit 2005-06-06 07:40:27 UTC
I saw a previous iteration of Patrick Lam's patch on fontconfig mailing list:

http://lists.freedesktop.org/archives/fontconfig/2005-March/001253.html

Following discussion was interesting too... Any recent news from Keith Packard &
al. concerning this?

I personally do not like the approach taken very much though (no flame intended:
giving the size of the patch, I realize Patrick's efforts were not negligible).
I just do not see why we have to break the API at all; portable-safe methods to
mmap dynamically allocated stuctures exist and are used on a regular basis:
usually, one first needs to correctly specify a mmap'ed pages format, then
performs some extra work during the first load after system boot to remap
pointers cleanly. It is the logic equivalent of using indiced arrays during the
saving and loading phase, but nothing from subsequent code has to be modified (
which implies that the overall recoding effort become significantly less important).
Comment 22 Patrick Lam 2005-06-06 13:33:40 UTC
Sylvain: Your approach would work too; it didn't occur to me at the time (and
Google didn't remind me).  I have had bad experiences, though, with software
that mmaps to a fixed address.  

Do you have a reference to some example which implements pointer remapping? 
Does it work when different computers share the same fontconfig mmap file across
the network?  In my approach (and any I can imagine) I would need different mmap
files for different architectures, but at least machines using the same
architecture ought to be able to share mmap files.
Comment 23 Sylvain Fourmanoit 2005-06-06 17:50:59 UTC
> I have had bad experiences, though, with software that mmaps to a fixed
> address.

MMAP_FIXED mmaps are not full-proof, I agree, as they can very well generate a
MAP_FAILED error, and this would need to be handled. On modern platforms and
systems though, the usual trick to map the first instance freely then have the
following ones at same fixed virtual address eliminates most mapping rejections.

> Do you have a reference to some example which implements pointer remapping? 

Mmh... I have plenty closed-source code that does this, but the only open-source
project I can think of right now is emacs; given the complexity of the beast, I
am not sure it helps much. :-)

> Does it work when different computers share the same fontconfig mmap file 
> across the network?

Yes, provided you do a little more work, involving a local copy (that can very
well just be a shm_open map). Of course, the per-architecture limitation holds.

I have plenty of time this thursday and friday to write a proof of concept code;
I will of course post a link here.
Comment 24 John McCutchan 2005-06-06 21:16:53 UTC
I am also planning on working on this. I will be taking a different approach
than Sylvain and Patrick.
Comment 25 Patrick Lam 2005-06-07 07:15:40 UTC
Sylvain, your approach is better.  I've started modifying my patch to use your
approach, and I should have something to show for it in the near future.  In any
case, pretty soon I ought to have a hybrid patch which has elements of both
arrays and pointers, and then I'll look into backing out the other arrays too.
Comment 26 Patrick Lam 2005-06-08 21:35:27 UTC
Created attachment 47477 [details] [review]
Initial patch which mmaps fontconfig data structures

This is not done yet, but it does run fc-list and fc-match correctly.  I've
incorporated Sylvain's technique into some of the pointers, and will continue
to do so after I get all the data in and out.  Run fc-mmap to get a mmapping
file called 'fc' (in the current directory); the other fontconfig clients will
use this file instead of opening the fonts.
Comment 27 Patrick Lam 2005-06-09 06:01:22 UTC
Created attachment 47492 [details] [review]
This patch maps all fundamental fontconfig data structures.

There's still a couple of things I'd like to improve in this patch, but the
attached version works for me.	Comments and testing welcome.

The only feature that I think needs to be added is a way to specify where the
mmapping file should go (on the command line?  in the config file?) and I need
to figure out if I'm handling user fonts (vs. system fonts) properly.
Comment 28 Patrick Lam 2005-06-10 02:48:12 UTC
Created attachment 47516 [details] [review]
Patch as sent to the fontconfig mailing list.

This patch modifies fontconfig's CVS HEAD to use an mmap()able file
containing internal fontconfig data structures.  The new binary
fc-mmap creates this map file (in /tmp, for now[1]) and fontconfig
clients use this map file instead of loading font data themselves.

On my system, my modified version of fontconfig mmaps 147786 bytes
into memory (instead of creating around 100k of dynamic data
structures).  I believe that this patch is stable; it produces no
differences in behaviour as compared to a stock fontconfig under
my tests.

The public API modified the patch is a pair of functions,
FcMmapForce() and FcMmapSave().  FcMmapForce(true) forces fontconfig
to ignore any mmaping data which might exist, and FcMmapSave() is used
in fc-mmap to write the data structures to disk.  Unlike an earlier version
of the patch, no other API modifications are necessary.

The structure of the patch is as follows.

1. Loading: I include a new file, src/fcmmap.c, which contains the
bulk of the logic for reading and writing the mmap file; some logic
lives in src/fclang.c (to serialize a data structure private to that
file).	During fontconfig's initialization, it first calls
FcMmapLoadObjects immediately after opening the configuration file,
but before processing it, to load string constants into memory.  Later
on, fontconfig calls FcMmapLoadFonts to mmap the rest of the mmapped
data into memory, instead of loading the data from the font files.

2. Saving: I use a three-phase algorithm to save the data to disk.  In
the first phase (initiated by FcMmapSave's call to
FcFontSetSerialize(config)), I move all font information reachable
from currentConfig.fonts to a set of static arrays (with a first
relocation from the dynamic addresses to the static arrays).  Next, I
write the data to the candidate mmap file.  The final phase is the
fixup; I use mmap() to read the data back into memory, recording the
addresses given to me by mmap() to, again, relocate internal pointers
as appropriate.  Once fixup is complete, the mmap file is ready to be
loaded with no further transformations.

Thanks to Sylvain Fourmanoit for hints on how to improve an earlier
version of this patch.

I would like this patch to be considered for inclusion in the fontconfig
CVS.
pat

[1] I would appreciate any suggestions on where the map file should be
created.  It is difficult to put the map file's location in the
configuration file, because the map file is needed before fontconfig
loads the configuration information.
Comment 29 Patrick Lam 2005-06-10 03:18:43 UTC
Um, oops.  The attached patch has some problems due to a last-minute change. 
Please refer to http://plam.csail.mit.edu/~plam/tmp/fontconfig-mmap-050609.diff.
Comment 30 Sylvain Fourmanoit 2005-06-10 03:29:28 UTC
Thanks Patrick!

I currently get some random segfaults using your latest patch agaist CVS head, I
will try to isolate what goes wrong and post my findings here.

Looking at your code, I still have a few concerns about dynamic data duplication
in the mmap'ed file (are FcCharset instances being copied multiple time, or did
I miss something?), as well as the whole saving procedure using a lot of dynamic
memory (approx. three times the memory for a simple load)... I already spent a
few hours reimplementing the same idea, with a single-pass saving algo.: I
should post it here before sunday; of course, I will respect the API semantic
you choosed, and probably just borrow your fc-mmap program.

Sylvain.
Comment 31 Patrick Lam 2005-06-10 03:43:26 UTC
I was getting a few segfaults with the version attached here (they were caused
by trying to strlen(getenv("foo")), which can fail); the version on my webpage
should not fail that way.  Let me know if you still get segfaults, preferably
with backtraces.

I'm looking into FcCharSets; there are no explicit provisions for not
duplicating them.  It shouldn't be hard to do that, but the question is whether
it's really a win or not.

I'm not especially concerned about the memory usage of fc-mmap, since it only
gets run once and then terminates.
Comment 32 Sylvain Fourmanoit 2005-06-10 04:24:59 UTC
> It shouldn't be hard to do that, but the question is whether
> it's really a win or not.

On my home system (completely unmodified regarding fontconfig: out-of-the-box
settings used), the debug printout is reporting that FcCharSets are taking one
40% of the dynamic memory used, and manual address checking taugth me that half
of them are used more than once... I do not know if it is representative, though.

> I'm not especially concerned about the memory usage of fc-mmap, since it only
> gets run once and then terminates.

Well, I am a bit more concerned... I administer a (too small) server farm with
about 200 concurent users, 30 on the average being graphic artists that would
use  _huge_ fonts caches (>30MB of RAM): I had to put in place ugly hacks to
give access to fontconfig-lookalike stubs: shared information is the only clean
way I can think of to really provide the service in this case: if I could, I
would prefer not having the users use >90 MB of virtual memory to generate a
mmap'ed version, even if it is just an occasional operation.

Of course, your patch would already be a huge step towards making fontconfig
usable for me (If I can get it to work: I currently experience non-systematic
segfault on fc-list and fc-match).
Comment 33 Patrick Lam 2005-06-10 05:46:02 UTC
Created attachment 47519 [details] [review]
More efficient fc_mmap behaviour

Sylvain, this version actually frees things during mmap creation; it should be
helpful for you.
Comment 34 Patrick Lam 2005-06-10 13:15:10 UTC
New version at

http://plam.csail.mit.edu/%7Eplam/tmp/fontconfig-mmap-050610.diff
Comment 35 Sylvain Fourmanoit 2005-06-10 18:13:07 UTC
Thanks Patrick. This now works great here too; I now have little incentive
working anymore on this, since what you pulled so far now perfectly fit my
needs: I wish you best of luck for inclusion in fontconfig; seeing you last
answer to Mr. Packard, I hope this solution will be choosen instead of the former.
Comment 36 Patrick Lam 2005-06-10 18:19:20 UTC
Thanks!  Let me know if you have any remaining issues.
Comment 37 Mike Hearn 2005-08-01 14:32:43 UTC
Is this bug lying in limbo or what? Does it need to be submitted to the FDO
bugzilla? Does Keith need a kick? ;)
Comment 38 Mike Hearn 2005-08-01 14:35:42 UTC
Oh, never mind. I went digging and found it's being merged into fontconfig CVS
right now, which is great to see :) Sorry for the noise.
Comment 39 Mike FABIAN 2005-08-24 13:17:51 UTC
Where did you find that it is being merged into fontconfig CVS?
It is still not in fontconfig CVS.
Comment 40 Patrick Lam 2005-08-24 14:15:37 UTC
It is on the branch fc-2_4_branch in the fontconfig CVS, and still being
developed (I've changed approaches several times).
Comment 41 Patrick Lam 2005-09-27 17:54:44 UTC
Fontconfig development release 2.3.90 (and soon, 2.3.91) has an mmapable cache.

http://www.fontconfig.org/wiki/Devel
Comment 42 Kjartan Maraas 2006-05-07 20:54:34 UTC
Closing this since the fix hit CVS long ago.