This is just a collection of some ideas after reading about this method. They have a single, persistent server process that either serves requests directly or is running behind apache. They even have a method to start the persistent server via some autostart.cgi executed by Apache.
CGI (multi-process) == problem
CGI is a problem, but also a reality. Some systems out there offer nothing more than CGI.
The special problems of plain CGI are those:
- slow startup (the bigger and more powerful moin gets, the worse this will be)
- no possibility to keep/cache data between requests except on slow disk
The problems with enhanced CGI (as in fastcgi or mod_python) and in all multi-process setups are those:
- data changes behind your back, so you have to watch all the time
data changes on disk, so watching is slow
- watching makes code more complicated
- you can keep data in RAM between requests, but it isn't too useful, because there can be cache coherency problems due to data changing on disk. So this speeds up mainly code module import.
the hard way
The hard way of solving that would be to drop all CGI stuff and do some single-process-only approach (see URL at top of the page). We already have such servers (twisted-based and standalone). They can run behind apache.
Completely dropping CGI support is currently no option as it would make many of our non-root users really unhappy, because they maybe can't do another configuration.
But what would we win (if we did)? Don't panic, this is just theory!
If we would use just a single process, it would mean:
- we can keep lots of data structures in RAM and just have to coordinate thread access to them (fast)
- a page dict with page name, state, metadata, content, backlinks, ....
- a page name TRIE for gaga2 like stuff
- this is door-opening for features and scalability not possible today
- nothing changes on disk that we have to watch, easier code, more time for other features
- we would use threads to cope with long-running functions (running single-threaded is currently no option due to that)
the soft way
So what could be the soft ways? Maybe we could think about favouring long-running stuff while keeping CGI possible.
Global Moin Lock (GML)
If moin would be designed to run single-process only (and not watch what's happening behind its back on disk), we could run it as CGI, but we would have to use a GML to avoid multiple processes running concurrently (or at least not running the critical areas in the code concurrently).
That would slow down CGI for wikis with many concurrent users (but CGI isn't best for that anyway), but would work for single users or wikis with low traffic.
We could do interpreter loading and module importing before locking as well as result output after releasing the lock, so at least those uncritical code parts could be done concurrently (for plain CGI). As that stuff takes quite some time and runs concurrently, the impact for normal page views wouldn't be dramatic.
The only real problems would be features taking lots of time in the locked code parts, like:
- search (essential, but can be sped up by using an index and maybe split in multiple requests by not searching all at once)
- eventstats (could be switched off)
- other long-running future features (could be switched off)
The GML could be realized by some storage.open(EXCLUSIVE) (and ...close()), so stuff could even run unlocked until storage access is necessary and continue to run unlocked after the close() call.
Using Apache multithreaded?
Apache2 internally supports multi-threaded operation (for static stuff).
Does this also work somehow for fastcgi or mod_python so there would be only one python and moin process, but multiple threads?
Comments
I don't think this approach to the problem(s) does help much. There are several problems that may be discussed but not all of them are related to CGI. I think it would make more sense to identify single problems and evaluate how much impact they really have and what could be possible solutions. I'd guess dropping CGI doesn't help in most cases anyway... So I suggest to refactor this page into the single technical issues and not keep it grouped to possible (wrong) solutions. -- FlorianFesti 2005-08-22 10:55:16
I strongly second Fabi's opinion here. -- AlexanderSchremmer 2005-08-22 11:18:09
It is not only about existing problems (and how to solve them somehow), but a general design question:
- do we limit what moin can do due to multi-process problems (or developer time, because some stuff is hard and time-intensive to implement correctly) in favour of best-possible (but still slow) CGI operation, or
- do we limit how multi-process operates to get rid of the problems only multi-process concurrency has in favour of easier and more correct code and features not possible with multi-process. This will make the slow code somewhat slower, but the fast code potentially much faster, esp. for big wikis.