Contents
MoinMoin performance tuning tips
If you have a big and/or busy wiki, here are some tips for tuning it. Please bear in mind that this page is for sysadmins of high-load wiki servers and assumes some knowledge about how your specific web server/platform works. It is not (and probably won't be) any specific howto.
Server environment
Hardware
- fast CPU(s)
- lots of RAM helps, as it is usually used for disk cache (and for in-process caching of data)
- fast and reliable local HDDs or SSDs, preferably in a RAID setup
Software
Moin uses the filesystem for page storage and accesses it often, so make sure your filesystem is fast:
- avoid data_dir on slow network filesystem (and never ever use NFS for moin, NFS has known issues with file append mode leading to data corruption)
- use a decent (fast, but also reliable) file system, e.g.:
- ext3 on Linux, mount it with nodiratime,noatime to avoid unnecessary disk accesses. Use dir_index.
- ext4 on Linux
- UFS + softupdates on FreeBSD
zfs on Solaris / OpenSolaris, mount it with atime = off to avoid unnecessary disk accesses.
- NTFS on Windows (FAT file system is a very bad idea)
Web server software/configuration
- use WSGI (e.g. mod_wsgi for Apache2)
- avoid slow standard CGI
- configure your web server to serve /robots.txt and /favicon.ico from the appropriate files, so they do not hit the wiki engine
- if you run the standalone server behind Apache or other web server, configure the web server to serve all static files
- read logs and:
- exclude crawlers of search engines you don't need by robots.txt
- if crawlers of wanted search engines trigger actions and don't get 403, add them to moin's ua_spiders lists
- use your web server configuration or moin's hosts_deny for IPs abusing your resources
Use a recent MoinMoin version
moin 1.9.8 has some performance relevant improvements (for details, see docs/CHANGES):
- cfg.log_timing = True emits INFO level log output about timings, so you can see what costs you a lot of cpu time / load
- cfg.backlink_method: avoid expensive backlinks searches triggered by bots/crawlers
- cfg.log_events_format = 0 means: dot not create event-log entries (saves disk space, disk I/O)
AbandonedPages macro, RSS feed: reduced load caused by bots
- better caching and lookup optimizations for userprofile data (lookups, page subscriptions, saving of pages)
- fixed: does not create empty pagedirs (with empty edit-log). To clean up all the trash pagedirs, use moin ... maint cleanpage.
- tuned editlog.news() - this is used internally to detect changes in the wiki
Wiki software/configuration
- using a recent Python interpreter often helps with better performance, better memory management and fewer bugs (recent moin versions work with py 2.6/2.7)
same reason for using a recent MoinMoin release
- use antispam, surge protection, textchas for public wikis
SAVE is slow? If reverse DNS lookups are broken for your network, either fix DNS or use log_reverse_dns_lookups = False.
Regular maintenance
Do this stuff every few months or once a year (depends on how busy / exposed your wiki is):
run moin ... maint cleanpage - get rid of anything it classifies as "trash". Please note that it outputs a shell script that you should run after reading it.
run moin ... maint cleancache
The statistics stuff (EventStats, PageHits) is reading data/event-log. That file is growing over time and big event-logs slow down the statistics stuff. So if you are not interested in the stats from 2 years ago, you maybe want to rotate that log for performance reasons. You could even just truncate event-log to 0 bytes if you don't mind your statistics stuff starting from scratch.
Educate your users
Some users need help with their client configuration and usage, so tell them ...
- not to use mirroring tools on the wiki (give them a static html dump made with moin export dump)
- not to request the RSS feed too often, but in an appropriate interval (like 5..60 minutes depending on how busy your wiki is)
- not to use link prefetching accelerator software
- to prefer title search (title search is much less cpu and disk intensive than full text search and often does the job)
Some hints if above is not enough
Most expensive operations are related to the number of pages in the wiki. If you intend to have a big wiki, consider running a farm of multiple smaller wikis.
Some stuff in moin takes lots of CPU cycles and disk I/O:
FullSearch macro (for some uncritical stuff, maybe use FullSearchCached and manually refresh when needed)
- Some users like to put that on their home page searching for places where they contributed or where they are mentioned. If lots of users do this and a bot grabs all pages, this puts a big load on the wiki server.
linkto search (triggered by backlink search, Categories etc.) - if you use moin >= 1.9.8, it will not render the linkto-search link for anon users/bots to avoid this usually unnecessary load. It can be configured to alternatively show some different behaviours, including the one before 1.9.8.
- you can use Xapian based indexed search to speed up searching
SystemInfo page
PageSize page
PageHits, EventStats/UserAgents and EventStats/HitCounts pages
EditedSystemPages page
TitleIndex (but you likely want to keep this open even for searchengine bots, so they can better find all your pages)
- all installed translations of the above mentioned pages (only install translations which your users really need)
- rss_rc action (RSS feed) for single pages (moin 1.9.8 will contain a speedup for this)
For some of those costly pages, you could think about just putting #acl Known:read,write All: on them to avoid bots and unregistered users triggering that stuff.
If you want to analyse what's making the load for you, see the built-in log_timing feature (see docs/CHANGES, moin >= 1.9.8).