introduction

MoinMoin needs to cache a lot of data to perform better. This includes:

requirements

A good caching framework for moin should be able to:

tracking dependencies

If, for example, the wiki source of a page is modified, everything derived from that source (cached html for example) must be discarded.

Therefore, I propose that whenever something is stored into the cache, it is listed whatever it depends on. But also see below.

cache timeouts

The month calendar plugin for example should note that the cache for a page containing it must be refreshed at least after midnight, because then a new day must be displayed. A timeout should always be given as an absolute date/time, not as an offset, to ease implementation. If a timeout is set multiple times, the timeout is set to the most recent (ie. it is only set if the new value is more recent than the old value).

implied dependencies

Since MoinMoin has a lot of dependencies on the page source code, it would be ideal if the cache used a multi dimensional space rather than a 1D space, so that it would automatically invalidate the sub-dimensionions for a page if the page source was changed. Calling cache.invalidate(pagename) would invalidate everything that has the same pagename, which was for example stored by a call to cache.get( (pagename,'text/html'), fun) (note the pair being used!).

two-level cache discussion

The CGI version of MoinMoin will not benefit from an in-memory cache, so it has to write data out to the disk at all times.

It is conceivable that the persistent versions of MoinMoin would benefit from caching some frequently used data in-memory, but the algorithm for that would need to be able to limit the cache size. But I need to find some algorithms for implementing a LRU fixed-size cache with items that are not fixed-size.

proposed API

The cache should be able to store any python object that is immutable in the application logic.

   1 class cache:
   2     def get(self, item, creationfun)
   3     def invalidate(self, item)
   4     def set_timeout(self, item, timeout)
   5     def add_dependency(self, item, on)

Throughout, item is either a string or a tuple. If it is a tuple, then the tuple values are regarded as coordinates in a multi-dimensional space. When invalidating a certain tuple, all sub-dimensions of that tuple are invalidated as well.

get

You call get whenever you need some result that should be cached. If the cache has nothing stored for the requested item, then it calls creationfun and stores the result. creationfun is called without arguments, in most cases it is probably going to be a lambda defined inline in the call to get. You should never call creationfun or equivalent itself as that would bypass the cache (unless of course you need to bypass the cache for some reason).

invalidate

invalidate an entry in the cache, also invalidating everything that depends on the given item. invalidate could possibly be implemented as set_timeout(self, item, 0)

set_timeout

sets the timeout for the given item. This call must be valid as a nested call from creationfun inside get. Whenever an item times out, everything that depends on it must also time out.

add_dependency

adds a dependency to a cached item. on can either be a string (then it is treated as a name key) or a tuple (dim1, dim2, ...)

examples for multi-dimensional caching

Lets say something was stored as: {{{cache.get( ('FrontPage', 'text/html'), make_frontpage_html) cache.get( ('FrontPage', 'linklist'), make_frontpage_linklist) }}}

Now calling cache.invalidate('FrontPage') (which is just a shortcut for cache.invalidate(('FrontPage')) would invalidate the 3 items:

It would also invalidate an item ('IncludeFrontPage') if that item was linked to the frontpage with, for example, the following call:

cache.add_dependency('IncludeFrontPage', 'FrontPage')

It would also invalidate that item if it was linked via

cache.add_dependency('IncludeFrontPage', ('FrontPage', 'anything'))

even if the tuple ('FrontPage', 'anything') doesn't really exist in the cache.


See also FIFOCache.


CategoryFeatureRequest

MoinMoin: CacheFramework (last edited 2007-10-29 19:11:46 by localhost)