Caching in MoinMoin

as of Moin 1.5 Caching works like this:

Moin caches the pages as byte compiled python code. The cached entry for each page is located in data/pages/PageName/cache/. This directory contains the following files:

text_html
The byte compiled python code of the page.
pagelinks

The BackLinks to the page.

The DeleteCacheAction deletes text_html only (unconfirmed).


/!\ The following is probably terribly outdated! There is a new CacheFramework under discussion

Some operations in 1.3 are very slow compared with 1.2. It seems that we can't fix in 1.3 unless we add some new caches.

Slow operations

These afffect operation with large numebr of pages:

Pages cache

Tuple with date, pages

Data to save for each page:

Key

Type

Value

Comment

exists

bool

True or False

revision

positive integer

page revision number

used for updating page data

acl

acl object or None

page acl object

effective acl - the acl of the last available revision on storage

pagelinks

dict
(set for Python 2.3+)

links to other pages

Used for backlinks search

Cache timing

Using cache as described above instead of calling getPageList and checking readability, which access the disk for each page to get the current revision.

Server Type

1.3 linkto: search

cached backlinks search

Improvement

standalone

0.77s/0.825s

0.002s/0.050s

X385/X16

cgi

0.64s/1.04s

0.016s/0.349s

X40/X3

TitleIndex

Server Type

1.3

cached

Improvement

standalone

0.47s/2.0s

0.27s/1.13s

about X2

cgi

-

-

-

OrphanedPages

Server Type

1.3

cached

Improvement

standalone

1.53s

0.22s

about X7

cgi

-

-

-

WantedPages

Server Type

1.3

cached

Improvement

standalone

1.18s

0.31s

about X4

cgi

-

-

-

LikePages

Server Type

1.3

cached

Improvement

standalone

0.20s

0.11s

about X2

cgi

-

-

-

NoSuchPage

Visiting non existing page. Run both EditTemplates and LikePages macros

Server Type

1.3

cached

Improvement

standalone

0.25s

0.12s

about X2

cgi

-

-

-

Title search for MoinMoin

Server Type

1.3

cached

Improvement

standalone

0.11s/0.17s

0.02s/0.08s

about X5/X2

cgi

-

-

-

Text search for parser help mail, return 19 results.

Server Type

1.3

cached

Improvement

standalone

1.05s/1.15s

1.05s/1.10s

None (expected)

cgi

-

-

-

Text search with modifiers: title:help parser, return 11 results.

Server Type

1.3

cached

Improvement

standalone

0.32s/0.37s

0.16s/0.23s

about X2/X1.5

cgi

-

-

-

Dual cache

The cache can be kept in memory and in a pickle using a CacheEntry. Long running process will use the memory copy, cgi will load the disk copy for each request that require access to the data in the cache.

Updating the cache

The cache will be updated on each edit operation, so clients does not have to look for needed updates, only read new updated cache when needed.

Locking

To share the cache between multiple processes, we need locking.

Read cache

The cache is read once per requet if the saved editlog date is different from the current editlog date. CGI will load the cache for every request, long running process only after a page as changed.

Update cache

Done for each edit operation, save, raname etc, or when cache mtime is older then editlog mtime.

When write lock is active, other processes must wait until its released or expired.

ACL caching

Today each page has rather big acl object, because we copy into each page the acl_rights_before, the actual page acl or acl_rights_default, and last, acl_rights_after.

New system:

  1. If page has acl, the object is cached
  2. Pages without acl cache None
  3. acl_before, acl_default, acl_after will be live object saved in the wiki config.

security.Permissions in the test wiki uses this code to check acl:

   1 # may should return True, False or None for no match
   2 
   3 # Check before
   4 allowed = acl_before.may.what(pagename):
   5 if allowed is not None:
   6     return allowed
   7 
   8 # Check page or the default acl
   9 pageACL = cache[pagename]['acl'] or acl_default
  10 if pageACL is not None:
  11     allowed = pageACL.may.what(pagename):
  12     if allowed is not None:
  13         return allowed
  14 
  15 # Check after
  16 allowed = acl_after.may.what(pagename):
  17 if allowed is not None:
  18     return allowed
  19 return False

Cache test

Here is a cache test that times page cache for various sizes of wikis: time_cache.py

The test use typical acl for 20% of the pages, and None for the rest, assuming use of acl caching described before.

Here are results with fast desktop machine (G5 2x2G):

Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 1000
Test cache for 1000 pages:
    Create meta cache: 0.08533788
    Get meta from cache: 0.00650287
    Edit acl cache: 0.02248096
Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 5000
Test cache for 5000 pages:
    Create meta cache: 0.47687006
    Get meta from cache: 0.04340291
    Edit acl cache: 0.15440583
Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 10000
Test cache for 10000 pages:
    Create meta cache: 6.34870219
    Get meta from cache: 0.11874986
    Edit acl cache: 0.31563902
^[[AAluminum:~/Desktop/acl nir$ python2.4 time_cache.py 20000
Test cache for 20000 pages:
    Create meta cache: 14.71968412
    Get meta from cache: 0.17779613
    Edit acl cache: 0.83818293

And results from an old laptop (1999, Powerbook G3 350MHz, 256 MB Ram, Gnu/Linux Debian 3.1, kernel 2.6.9)

freeknowledge:/mhzwiki$ python time_cache.py 100
Test cache for 100 pages:
    Create meta cache: 0.13144612
    Get meta from cache: 0.00554109
    Edit acl cache: 0.02425313

freeknowledge:/mhwiki$ python time_cache.py 1000
Test cache for 1000 pages:
    Create meta cache: 1.49055505
    Get meta from cache: 0.06276393
    Edit acl cache: 0.21379018

freeknowledge:/mhzwiki$ python time_cache.py 5000
Test cache for 5000 pages:
    Create meta cache: 59.65417218
    Get meta from cache: 0.44876194
    Edit acl cache: 1.36100388

freeknowledge:/mhzwiki$ python time_cache.py 10000
Test cache for 10000 pages:
    Create meta cache: 133.82670307
    Get meta from cache: 0.78346896
    Edit acl cache: 2.57339716

freeknowledge:/mhzwiki$ python time_cache.py 20000
Test cache for 20000 pages:
    Create meta cache: 402.18088794
    Get meta from cache: 2.14711905
    Edit acl cache: 6.06097007

Pentium 200 MHz:

>python time_cache.py 1000
Test cache for 1000 pages:
    Create meta cache: 1.38895202
    Get meta from cache: 0.07669497
    Edit acl cache: 0.19887996
>python time_cache.py 5000
Test cache for 5000 pages:
    Create meta cache: 122.96757412
    Get meta from cache: 0.64906096
    Edit acl cache: 1.09516811

Encoding acl rights in a more compact way

ACL rights are kept in a dict:

{'read': 1, 'write': 1, 'delete': 1, 'revert': 0, 'admin': 0}

It very fast to get the right from the dict, but pickling it means encoding dict with up to 10 object for each entry.

We can encode the rights in a more compact way, using bits:

   1 READ_NA = 0x1
   2 READ_ON = 0x2
   3 WRITE_NA = 0x4
   4 WRITE_ON = 0x8
   5 DELETE_NA = 0x16
   6 DELETE_ON = 0x32
   7 REVERT_NA = 0x64
   8 REVERT_ON = 0x128
   9 ADMIN_NA = 0x256
  10 ADMIN_ON = 0x512
  11 
  12 ACL = [
  13     # Entry, rightsdict
  14     #+WikiAdmin:read,write,delete,revert,admin
  15     ('WikiAdmin', READ_ON | WRITE_ON | DELETE_ON | REVERT_ON | ADMIN_ON),
  16     # Example of a modifier
  17     # +EditorsGroup:read,write,delete,revert
  18     ('EditorsGroup', READ_ON | WRITE_ON | DELETE_ON | REVERT_ON | ADMIN_NA),
  19     ('All', READ_ON),
  20     ]

Now each acl rights dict is a single integer!

Comparing rights can be done with something like (not tested yet against the current code):

   1 def may(what):
   2     rights = {'read': (READ_NA, READ_ON),}
   3     right_na, right = rights[what]
   4 
   5     for entry, aclrights in self.entries: 
   6         if right_na & aclrights:
   7             # Ignore this right
   8             continue
   9         if right & aclrights:
  10             # User has permission

And these are the results of this optimization - 200% faster load from disk:

Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 1000
Test cache for 1000 pages:
    Create meta cache: 0.07634401
    Get meta from cache: 0.00382185
    Edit acl cache: 0.01236987
Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 5000
Test cache for 5000 pages:
    Create meta cache: 0.46235013
    Get meta from cache: 0.01934600
    Edit acl cache: 0.08320403
Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 10000
Test cache for 10000 pages:
    Create meta cache: 6.22164893
    Get meta from cache: 0.04717183
    Edit acl cache: 0.20548987
Aluminum:~/Desktop/acl nir$ python2.4 time_cache.py 20000
Test cache for 20000 pages:
    Create meta cache: 14.33688688
    Get meta from cache: 0.09433103
    Edit acl cache: 0.44597101

Here is the modified test code: time_cache_2.py

MoinMoin: MoinCaching (last edited 2007-10-29 19:20:56 by localhost)