FeatureRequests/MoinMoinHighAvailability

Contents

First of all some questions:
Basic Scenario
Implementation-Ideas:
Implementation-Steps
Discussion

First of all some questions:

Has anybody out there startet to implement a higher availability of MoinMoin?
I have some ideas for this topic, because in my company MoinMoin just starts to become mission-critical .

But is this feature really important? Who needs this feature too?

Does anyone of you needing it in your organisation?
How do you realize High-Availability?
Do you realize it?

Basic Scenario

My (after 2 days of thinking) favourity scenario is this:

MasterServer, EditingSystem
- There is one MasterServer, where editing happens
- And we will have a HotStandby-MasterServer, which is inactive and becomes active, when s.o. makes him so, because the MasterServer has crashed
- Synchronisation between the two servers could be made by some rsync-Scripts.
  - More precise: Synchronisation between Master and HotStandbyMaster is unidirectional, but the "HSM" has the same configuration than the Active Master (like Cisco CSS)
SlaveServer, LiveSystem
- There can be multiple SlaveServers, maybe two or three, synchronised by rsync with the Master, but without any Editing-Function.
- ReadOnly should be realized by reducing the allowed ACLs to "read".
- The EditButton is a Link to the MasterServer with normal ACLs.
- Slaves could be accessed by DNS-Round-Robin or by a dedicated LoadBalancer-Architecture.

Implementation-Ideas:

After some days of installing 2 UML-instances i'm ready to implement the idea described above. But there are some details i dislike:

Synchronisation whith external tools like rsync would be unaestetic. Periodically (cronjob) run of rsync produces overhead, has to run over all files, because rsync couldn't know what has happened in the application. The better idea must be, to code this in the main-procedures from moin.
So i'm looking for the code, which makes the links. Just remember:
1. If s.o. is surfing on a slave-system, the link under the edit-button should redirect him to the master-server.
2. Saving a page on the master server should cause the upload of the new page from the master to the slave-servers.
3. Editlog must be adjusted on the slaves for the update the "RecentChanges"-page.

Now, my idea is this:

I'll change the URL (the hostname) of the Edit-Button. But at the moment i could not understand, where i should code this?
Who would show (or tell me ) the moin-files, i have to start?

Some other aspects:

To have a consistent codebase for all instances (same-moin-code on every server), we would need some new variables for the main-config or the instanc-config. Something like the following:
- servertype=(standalone,master,backup,slave)
  # backup mean a hot-standby-master whith the same rights and configuration,
  #but will not be adressed from the slaves because of his other hostname.
  #Backup could heartbeat the master and get master, when the master dies...
- master=(hostname-m)
- slaves=(hostname-s1,hostname-s2,hostname-sn)
- backup=(hostname-b) # optional, for highest availability
This variables should control the moinmoin-behaviour.

I think, if we have implemented this, we have a healthy margin (have a lead) over other engines like MediaWiki or TWiki. And i think, this could boost the acceptance from moinmoin in bigger companies.

(I don't know, if everybody would like, that companys use free software for free, but i think, that it must be good for the world, if big companies improove their own efficiency and their intelligence, their brainpower )

I think the best solution would be to put all the wikis behind a load balancer that can route request according to the url, such as Pound. Then filter all urls that have action=edit and all urls with method POST to the master wiki. All other wikis will get only GET and other read-only actions.

Maybe (after moin is fixed and no GET will change data) you can even route only POST to the master wiki. One can edit the page on one wiki, preview, but the last save will go automatically and transparently to the master wiki. All the wikis will have the same url.

If you go with this solution, then any wiki can act as master, and any other wiki can act as master standby. When a wiki save its data - edit/delete/rename/attach, it will save the data on its own copy, and update all other wikis in the group.

When a master wiki fail, the load balancer will route the request to another wiki and it will become master automatically. When it will save data, it will update all other wikis.

To implement, you have to use a common list of all wikis in the group, so each wiki can update all other wikis. You can put this list on a shared file system, or on a local web server.

Note that some of the actions that change the wiki data do use GET currently!@#!

If you want to go with the wiki solution, it will be much more work. You will have to catch all the action that change the data and hack all the links to those actions.

Hi Nir:
- Loadbalancer would work, i think. But how to synchronise the instances in realtime? Because of we are working without databases (mysql has master-slave-replication, Oracle real cluster has it anyway), i mean, we have to implement this in moin. Yes, you're right, i have to learn to implement this by myself... And i'm workin' on it...
  On this page i'm seeking for hints about the structure of the code, where i could replace the name of the wiki (which was typically the same host) with the name of the master-server.
- Or s.o. has another idea (like JürgenHermann) with a central filestore, a locking-mechanism for pages over multiple-instances of "master-wikis" without any slave.
  But i think, master-slave-architecture is easy in comparison with other solutions. (see DNS, MySQL, some CMS etc...)
- You want to look at cxfs and co.
- What do other mean?
  - I don't know

To synchronize the wikis in realtime, you will have to add code to the parts that change data, so instead of one page save, you will save the same file to multiple wikis. This can be done with wikirpc, or with direct access to the file system of the other wikis.

Implementation-Steps

Today i hacked the sync.sh-Script together, works with rsync and works with 2 (virtual, UML-) Systems. Slave syncronizes his files with the master periodically. And i extend the system-messages (in i18n/de.py, because i'm workin' in germany) on the two systems, which informs the surfers that they work on a cluster-infrastructure.
But that was not very fine. Would Shared Storage be a solution?

My instance works with Hard-Locks. So if a page on the storage is locked, no one (also on every running instance) should be able to edit it.
I looked about heartbeat and drbd: In Debian only beta, what about other (free) solutions? GFS from redhat? OCFS from Oracle?

I'm really unsure what to do at this point...
Waiting for inspiration

Discussion

What do you mean?

The 2nd one sounds much more promising (does not only HA, but allows scalability!) and is easier to implement, since sync is uni-directional. If you configure the load balancer with a lower weight for the master, then it can sustain the edit load while the other bear the page serving load. I assume here that serving the content is critical, not the ability to edit it. If the 2nd, you could combine both scenarios.
- At the moment only serving the content is critical. But Editing becomes critical more and more.
For the 1st one: do not rsync 2 masters bi-directionally but let them store their data on a HA filestore (separatiopn of converns); then you only have to make sure exactly one is active and writes to the store at any time, which is easy (done in the LB).
- You're right: Look at the top
- NFS is not the best option here, more native storage should be considered here. Maybe something like cxfs.
- Or drbd.

CategoryFeatureRequest

MoinMoin: FeatureRequests/MoinMoinHighAvailability (last edited 2007-10-29 19:19:21 by localhost)