MoinMoinIdeas/WikiApplicationServerPage

Contents

What is WASP?
WASP != WASP !
The saga begins
Implementation log
1. How to decide what is static
Ideas
1. Dynamic Parser Callback
2. Dependencies

What is WASP?

Early versions of MoinMoin used to use a lot of CPU power, because it parsed every page on every view. WASP is part of MoinMoin.

WASP "compiles" Wikipages into Python and caches the compiled byte code. Because of this every Page must be parsed only once (per edit) but dynamic content (like wiki links, search macros, ...) are still calculated at view time. WASP uses a special Formatter that produces a mix of python code and page content insted of the normal page content.

Because there are a lot of other things that are time consuming (especially loading all the python modules) the benefit of WASP is not too visible when using CGI (may get 30-50% of the python visible time). If MoinMoin is run in a persistent environment (mod_python, fast_cgi, Twisted, standalone) this time is no longer needed and parsing would be the only expensive operation per view.

WASP != WASP !

Calling this patch WASP originally was just a joke by JürgenHermann (see below). This WASP is a patch for MoinMoin that has nothing to do with this software: http://www.execulink.com/~robin1/wasp/readme.html (even though is has similar functionality)

The saga begins

Discussion from MoinMoinTodo/Release 1.1

Another note here, we've just recently started to get seriously hammered from usage, the load on the wiki server is fairly constantly above 5. Serving pages as static HTML when possible would probably help a lot, though anything which lightens the load in comparison to 0.9 would be much appreciated. -- AdamShand
- I already invested some thought into this, but I won't make any quickhacks here; this has to be a coordinated effort on the parser & formatter side, adding a caching component inbetween, and some templating stirred in. The current rough idea is to have the parser build an internal data structure (a page split into pieces), cache that into a shelve, and normally format from the shelve; in addition, a 2nd cache could be put into/after the formatter, which tries to replace as much of that structure by html fragments, leaving only the macro calls unreplaced (in extreme, the page written to a *.py file, with a lot of prints and some macro calls sprinkled in). And we could call it WASP, Wiki Application Server Pages.
  - WASP is a great idea. It took me some months to understand this, but here is how it could work: Build a Python Formatter. For static text it returns "print 'text'" for dynamic content it returns code to execute macros or processors or other stuff. Of cause following static text is put together. The python formatter would get another formatter as parameter, so it is not limited to HTML. To make this work the parser has to limit its functionality to parsing only. Macros and processors must be handled by the formatter, so that it can decide not to call them or return the calling code. The python formatter could use a list of macros and formatter that generate static output. In addition extension macros could define a "I'm static" flag. The python code returned by the formatter can be compiled into byte code. If this bytecode is loaded no parsing of any file is needed to show the page. -- If I think properly this is exactly what you described above, but without the caching components, which are not needed because we can construct the python code dynamically -- FlorianFesti 2003-01-23 04:45:25

Implementation log

Now I store the code in an extra list and only insert <<<>>> marks into the page which are replaced afterwards. Double quotes are replaced by \" and the pieces of the HTML source are surounded by request.write(""" and """).

The cached pages are really fast (didn't measure yet). The cached python byte code is slightly bigger than the HTML source of the page content (without header and footer, which both are not cached), which is much bigger than the wiki markup (the default MoinMoin FrontPage source/HTML/WASP 1317/3232/3400 bytes). First benchmarks are promsising. On Pages without expensive Macros it saves 25-50% of the time mesured my request.clock depending on the page size. I seems that the other large part of the time is used for loading python modules.

TW tried WASP with his Twisted branch. Results are great: Pages without expensive macros are rendered 20 times faster (large pages even 30 times). Twisted itself accelerates small pages by factor 10 and has nearly no effect (<<50%) on large pages.

In other words: WASP saves ~50%. Twisted saves ~50%. Together they save ~95%!

Problems:

some macros access attributes of the parser that are only present after the beginning of format() (e.g. TableOfContents parser.lines)
- has no negative effect for static macros (as TableOfcontents)
- $/!\$ there should be a list of resources a macro may use and can rely on
- Parser could generate such data on demand using a getattr() method

How to decide what is static

see Dependencies below

There should be a mechanism that every macro and parser can decide itself if it's static or not.

Add a variable into the extension files (Dependencies) with list of dependencies, possible items:
- "page" - the content of the page (do not use if the macro only depends on its own parameters)
- "pages" - the content of all pages in the wiki
- "namespace" - the names of all existing pages in the wiki (wikinames)
- "time" - changes between every request. object is always dynamic
- "language" - depends on language of user

The python formatter gets a list of things to treat as static (e.g [ "page", "namespace" ]) with the constructor. For every macro and processor the formatter checks if it depends only on items from this list. Then the macro/processor is treated as static else as dynamic. Normal markup always is static, the implementation of wiki links will test the static items list.

Dependencies should be set as exact as possible. This is not needed for WASP but could be used a basis for caching on macro basis. This could be macro independed by using the macros' dependencies.

Not implemented:

The formatter maintains a list of dependencies that really where used to allow a caching algorithm to unify several caches. This allows to cache a page as page.all insted of page.en, page.de, page.fr if there was no language dependened item on the page.

Ideas

Dynamic Parser Callback

WASP and the python formatter could be interesting for other parsers, too. The formatter could offer a method which gets a callback method of the parser a parameter. The default formatter would just call this parser method, but the python formatter would only output code for calling it. With this feature every parser could be used in combination with the python formatter.

Dependencies

By now the text_python formatter only distincts between dynamic and static content. But the decision if an item is static or dynamic depends on the caching strategy. Neither the parser nor the formatter nor the macros should have to know much about this strategy. By now it is hard coded in at least one of these three objects. To solve this problem every item (link, macro, text, wikiname) could not only know if it is dynamic or static, but on which facts it depends on. Possible things to depend on could be page content, existing pages (wikilinks), existing interwikipages, formatter, other pages, ... A list of the dependencies would be passed to the formatter with every call (using a sensible default value). The python_text formatter would get a list of things that can be treated a static. With this the text_python formatter could generate everything from a list of all formatter calls to a completely rendered page.

The Dependencies list that each plugin now has is suboptimal. For one, it consists of strings (bad, no one knows what is possible), and secondly it is static. A plugin might create content that is sometimes dependent on one thing, and other times depends on something else. It'd be nice if a new caching framework could take this into account.

I propose to return the Dependencies as a list of classes in MoinMoin.CacheValidity that each support the single method isValid(object). Then, a time class could be written similar to this:

   1 class MoinMoin.CacheValidity.Time:
   2   def __init__(self, maxage):
   3     self.maxage = maxage
   4 
   5   def isValid(self, object):
   6     lastchangetime = object.getLastChangeTime() # get last change of the queried item
   7     return lastchangetime + self.maxage < gettime()

MoinMoin: MoinMoinIdeas/WikiApplicationServerPage (last edited 2007-10-29 19:19:08 by localhost)