While playing with (py)inotify and seeing that watching data/ with lots of files and directories might be a bad idea, the following stuff came to my mind:
Whenever we make substantial changes to the wiki, we add an entry to data/edit-log:
- creating new pages, new attachments
- editing pages (== creat new revisions)
- deleting pages, deleting attachments
- renaming pages, (we don't support renaming of attachments)
At many places in moin, we have to care about coherency of caches (because in multi-process moin, stuff changes on disk behind your back) and check if the cache is still up-to-date by usually doing some stat call on the original version. Or we simply don't cache stuff just because of that.
We do some page.exists() call when rendering page links - if page existance state is (would be) cached, we also need to invalidate the information when the page changes on disk - if edit-log did not change, the page didn't change either.
We also cache ACLs for filtering the page list for some viewer.
In future, we want to cache metadata (mimetype, etc.) of items.
When we remember lastpos and mtime of edit-log, we maybe could even selectively invalidate parts of the cache (we see in the log what we have to invalidate).
There are some risks in just watching mtime of data/edit-log:
- in the short time between update of page local metadata (and content) and update of the global edit-log, cache is outdate
- when doing critical stuff (e.g. creating a new revision of an item), we MUST lock item and read metadata (current rev) from disk
- we have to be sure that global edit-log is correct and complete
See also MoinCaching.
I suggest having a kind of CacheManager object that collects all data that is currently stored in various attributes of the request object. At first this could be simply used to find caching uses simpler in the code, later a specific cache API could be introduced. -- AlexanderSchremmer 2006-08-14 14:15:52
Sample usage:
request.cache.acl_cache[foo] ...
What caches do we have
Memory
- page name list, request.pages, see Page.getPageList
- page acls, request.cfg._acl_cache, see Page.getACL
- page._current_rev/_exists/_pagefile, see Page
- user name/id mapping, request.cfg._name2id, see user.getUserId (+ disk)
- configs, see multiconfig._config_cache + _farmconfig_mtime
- multiconfig._url_re_cache
- interwiki map in request.cfg._interwiki_{list,ts,mtime}, see wikiutil.load_wikimap
- extension/parser mapping, request.cfg._EXT_TO_PARSER[_DEFAULT], see wikiutil.getParserForExtension
- known action, request.cfg.known_actions, see request.getKnownActions
iconsByFile in ThemeBase class
Disk
MoinMoin.caching module handles this:
- page bytecode - text_html see page.makeCache
- pagelinks - see page.getPageLinks
- surgeprotect data
i18n meta / lang data, see i18n/init
- spellchecker dict, see action/SpellCheck
- chart/statistics data
- antispam, see security/antispam
The edit-conflict tag file (that speeds up RecentChanges)
Locking
Disk cache
- Reading: lockless
- Writing: Should be possible to do lockless.
Each file is immutable, new content is add to a temporary file and moved with rename() or MoveFileEx() to the real location.
MoveFileEx does not work, except if file is opened with some special "share delete" mode.
I have implemented the non-windows part of the code. LazyReadLock, LazyWriteLock simply do nothing on non-windows platforms. Those lazy locks are currently used by caching.py (and caching.py cache update code was rewritten to use a temp file + rename). That code is running on http://test.wikiwikiweb.de/.
In 1.6 a new ItemCache class is introduced that (for persistent servers) keeps a persistent in-memory-only cache of item (page) related information (like acls, exists, etc.) and watches edit-log to invalidate cache entries on change.