Contents

User documentation
Open Issues
1. Known main issues
2. Longterm ToDo
Basic Requirements
1. Cases
Implementation
DVCS like tracking
Discussion
See also
Moin Wiki Attachments Backup

User documentation

Please read HelpOnSynchronisation.

Open Issues

Known main issues

Do I need to tag delete operations?
How to handle renames?
How should we store tags? (Metadata support would be handy)
- (currently done in Pickle files)

Longterm ToDo

Maybe refactor YYY into MoinLocalWiki
Add page locking, i.e. use the one in the new storage layer (see XXX).
Do older tags of one wiki site have to be stored as well? Why don't we
- keep just one tag?
Put author names into the comment field
Transmit mimetypes (see XXX). Needs new storage system.
Implement renamed pages.
Cache the result of remote.get_pages locally to reduce the load.

Basic Requirements

Both sides should be able to start the synchronisation.
The process should be atomic (per page) and be safe even if there is a heavy editing load on both wikis.
Interwiki monikers/names should be used to identify the other wiki.
The user needs to authenticate him/herself on the remote wiki.
Pages on which the conflicts have not been solved, are not synchronised.
- For remote conflicts, it has to be checked if the particular page was changed after the conflict-introducing merge. If this is the case, this page is not synchronised.
Allow unidirectional merging.
A log of the merge process should be appended to this page.

Cases

remote / local

nonexisting

deleted

exists

nonexisting

-

-

push page

deleted

-

-

merge

exists

pull page

merge

merge

One page has new revisions

Copy the last rev to the other wiki, using its revision number, so the updated wiki has some revisions missing.

Both pages have new revisions

Merge revision x+n1 with revision x+n2 and save as new revision on both sides.
If there was a conflict, it will look like a conflict you get while normally editing the wiki. The page is said to be conflicted then because it contains the conflict marker.

Deleted pages

Assuming that in wiki A the page exists and it was deleted in wiki B, we do a normal merge if there are no tags. If there were changes in Wiki A, there is a merge with a conflict. Otherwise (no changes past last merge), the page is deleted in Wiki A. This needs static info that could be transferred with the pagelist.

Implementation

Based on XMLRPC.

Every page will have multiple tags by different wikis. Based on those, the synchronisation code can decide how to build the differences.

Prerequisites

RPC:getToken(username, password)

An RPC call which returns a token that can be used to authenticate at a remote wiki. This token might have a limited time span.

RPC:applyToken(token)

An RPC wrapper method that is used to transmit the token and call the original function. Needs to be used in a multicall batch request.

RPC:batchRequest

Was replaced by the standardised MultiCall method.

Authentication agent

Other wikis should not trust other wikis in general but users they already know. This requires credentials to be sent. MoinMoin needs a simple action that manages logins per user for other wikis.

Need more details. What type of login action are needed? describe the different use cases.
- Add/modify login
  - Fields: interwikiname, username, password
- List logins

XMLRPC Interface

Docs moved to source code.

getDiff

Returns a computed diff of a page.

mergeDiff

mergeDiff(pagename, contents, localrev, deltaremoterev, lastremoterev, interwikiname, normalised_name) returns {status: "SUCCESS"/..., current: intRev}

Merges a diff on the remote machine and returns the number of the new revision. Additionally, this method tags the new revision.

Synchronisation Steps

Get the remote interwiki name and compare it (sanity check).
Fetch a local page list.
Fetch a remote page list.
Filter these lists according to the options.
Walk every list and do this for every page:
- Acquire a ReadLock (called UpdateLock in Nir's/Florian's code) locally - can read page, can't write
- Start transaction.
  - Search for local tags¹, store into tagRev
  - RPC Call GetDiff(pageName, tagRev, None)
  - Merge both changes, store into contents.
  - RPC Call mergeDiff(pagename, contents, currentLocalRev+1, lastRemoveRev, myInterwikiName) where as lastRemoteRev was got in the getDiff call
  - Upgrade to WriteLock if needed while saving the page on the wiki
  - Store merged changes to disk, tag page.
  - Release the lock.

DVCS like tracking

Distributed version control systems are known to handle syncronisation cases with many committers and repositories quite well. So does that go together with wikis well? In this section I want to discuss if a DVCS like Mercurial could be used a base component for building a syncronisation system.

Introduction

In a distributed version control system (DVCS), every node (i.e. local repository) that generates commits holds the revision data of commits done by other nodes as well. So it can do merging etc. locally without contacting other nodes. Furthermore, there is no need for a special server or a specific push/pull/sync strategy because all nodes have enough history to allow for complex syncronisation scenarios. By grouping all nodes into a graph based on which node merges with other nodes, we can describe a particular setup of wikis.

Limitations

In the aforementioned system, it is not possible to reorder the graph of nodes without getting merge conflicts because the tags are not distributed across merges and every nodes just knows it's own history and tags. So the calculation of the parent revision might fail if a node tries to merge directly with another node that only indirectly merged with the original node in the past.

Modelling using a DVCS

So the upcoming question is: how can we map the model of a DVCS to edits done in the wiki? It is preferable not to change the workflow too much. E.g. it is not easy to have the notion of multiple concurrent heads (of the revision DAG) in a wiki without many changes in the whole wiki system (even though it would make sense - to have e.g. a kind of "staging" system where anonymous users see the stable head of the wiki pages while skilled editors can modify the unstable head and merge - or copy, if the stable head is read-only - when it has gotten stable enough).

Compared to a DVCS, in a wiki, there are no "transactions" that span multiple pages/items (except the rename of a page). So a commit would only contain one changed page. Furthermore, merging should be done on a page-level in order to allow partly merges (in order to be able to e.g. restrict merges to namespaces separated by different parent pages, like described in the aforementioned UI).

I am assuming that the reader is aware of the DVCS called Mercurial and will use the terms and model defined by it in the following thoughts. As said above, merges should be done on a per-page level. It might be possible to send a (pagename, [heads...]) for every page and allow for O(no_of_pages*no_of_heads*log(no_of_revs)) for pulling in that case.

Mercurial operations to implement: annotate, pull, push, commit, merge/update, incoming, outgoing, parents, heads

$/!\$ The idea to follow this implementation strategy was dropped after talking to my mentor. So we will just have syncronisation of merge changesets for now.

Discussion

Should this mechanism be approached from the front-end or back-end so to speak. Quickly thinking about it, I think if the storage engine refactoring goes ahead theres an opportunity to use that to make the synchronisation easier, but at the same time you lose backward compatibility with the earlier versions. Just something to consider I thought - AA
- Which advantages do you see making this integrated into low-level layers? I do see a problem with meta-data because currently a page just consists of revisions and data in this model. I will see how we can cooperate in this area. -- AlexanderSchremmer 2006-05-10 15:33:41
  - I think I would prefer a serialisation/deserialisation interface to gather the metadata as a string. -- AlexanderSchremmer 2006-05-29 20:06:02
Using PKI (Encryption + Signatures) would also be a natural fit, to allow replicated wiki's while maintaining accountability of changes.-- Pieter 2008-03-20 00:00:00
Moingo -- SteveYen
- Here's my quick and dirty hack on using Moin Moin Desktop Edition with git and github, which I've named "Moingo".
  - http://github.com/steveyen/moingo/tree/master
- Before, I had been using Moin Moin as a private wiki, checked into svn, but quickly ran into limitations when I needed to grow to work with a distributed, occasionally-connected team.
- Moingo puts git and Moin together in the fastest way that could possibly work, I think, and admittedly is doing it wrong...
  - dumb last-one-wins synchronization strategy
  - not leveraging the power of git
  - using too much storage and file copies
  - etc
  - but, it's what I needed fast.
User experience:
- If you copy a wiki-page via FTP/SCP from wiki1 to wiki2 which was already synchronized, take care to delete the file synctags in the folder of the page. Otherwise you may get an error while synchronisation of wiki2, like "<Fault FROMREV_INVALID: 'Unknown from_rev.'>"
- If you synchronize a wiki-page which was renamed before, you may get an error, which stops the synchronisation and is difficult to figure out. As a solution delete the rename line at the top of the raw text of the wiki-page.
RudolfReuter 2010-06-25 15:32

Moin Wiki Attachments Backup

With the interwiki synch procedure it is possible to backup (synchronize) to a backup wiki via network, while the wiki is on-line.

Unfortunately up to moinmoin version 1.9.3 attachments to a wiki page can not be synchronized. For that job I have setup a little shell script with a rsync procedure.
You just have to take care, that the actual version 3.07 of rsync is installed in /usr/bin/rsync.

Using rsync to backup attachments

The Linux program rsync looks very good suited for that task. I tried it with the GUI luckyBackup (ver. 0.4.4) under Ubuntu 10.10 on the client computer.
Only the missing attachments are copied to the backup wiki.
The backup wiki has access to the main wiki via in house network. So I could mount the home folder via cifs.

1. Enable the home folder of the main wiki for remote access (Ubuntu 10.04.1)

2. Mount the remote home folder to /media/rudi72home:
   sudo mount -t cifs -o username=rudi,password=<password> //rudiswiki/rudi72home /media/rudi72home

A list will show the setup of luckyBackup:

Task Properties:
Task name: rudi72_moin18
Type: Backup the entire Source directory (by name)
Source: /media/rudi72home/moin-1.8.8/wiki/data/pages
Destination: /home/rudi/moin-1.8.8/wiki/data/

advanced properties:
Include: */attachments/*
Set a mark on:
  Preserve ownership, times
  Preserve permissions
  Preserve symlinks
  Recurse into directories
  Skip newer destination files

This will give the following rsync command string via the "validate" button:
rsync -h --progress --stats -r -tgo -p -l --update --include=*/attachments/* --include=*/ --exclude=* --prune-empty-dirs /media/rudi72home/moin-1.8.8/wiki/data/pages /home/rudi/moin-1.8.8/wiki/data/

rsync use on Mac OS X 10.6.6

A problem will arise, if on the backup side the folder attachments does not exist. Then there is no copying of the attachments files.
The problem was caused from an outdated rsync version (--version 2.6.9) in MAC OS X. A newer version (--version 3.07) works as expected.

In order to make the connection to the Linux Ubuntu 10.04.1 server easier, install there the package netatalk.
Because netatalk uses as name for the home directory HOME DIRECTORY with a space in the name, I changed it in /etc/netatalk/AppleVolumes.default to home_dir, which should be easier in concern of file handling.

The rsync call with an established server connection will look like:

# rsync ver. 3.0.7 in /usr/bin/rsync (old ver. 2.6.9)
rsync  -h --progress --stats -r -tgo -p -l --update --include=*/attachments/* --include=*/ --exclude=* --prune-empty-dirs /Volumes/home_dir/moin-1.8.8/wiki/data/pages /Volumes/hda8/INSTALL/Python/Moin/moin-1.8.8/wiki/data/

ssh use on Mac OS X

If you want to copy the attachments via Internet, it is recommended to use SSH encryption for data transfer. The original description is from http://www.jdmz.net/ssh/. Deviating from that, the Mac OS X ssh-keygen does allow a DSA 1024 bit key only.
If the ssh connection does work, and you want to use the rsync procedure via script automated, you have to generate a public DSA key.
In order to make the procedure more save than I have described, please have a look to the original web site.

For a quick test of the working ssh connection try with option -e ssh, with interactive password input:

# rsync ver. 3.0.7 with ssh
rsync  -h --progress --stats -r -tgo -p -l --update --include=*/attachments/* --include=*/ --exclude=* --prune-empty-dirs  -e ssh rudi@192.168.17.72:/home/rudi/moin-1.8.8/wiki/data/pages /Volumes/hda8/INSTALL/Python/Moin/moin-1.8.8/wiki/data/

Next generate a public key, in order to be able to automate the rsync procedure.

At the question for the passphrase just hit ENTER.

# First create the DSA keys
$ ssh-keygen -t dsa -b 1024 -f mac-rsync-key
Generating public/private dsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in mac-rsync-key.
Your public key has been saved in mac-rsync-key.pub.
The key fingerprint is:
d5:04:b4:03:0d:c6:0d:d5:d4:10:85:e9:da:f7:xx:xx rudiuser@rudi-users-iMac.local
The key's randomart image is:
+--[ DSA 1024]----+
|       .=B++=B.  |
...
|               o.|
+-----------------+

# Copy the public key to the server:
$ scp mac-rsync-key.pub rudi@192.168.17.72:/home/rudi
remote password: <password>

# Login to remote computer:
$ ssh rudi@192.168.17.72
remote password: <password>

# If folder .ssh does not exist in the home directory, create it
$ sudo mkdir .ssh ; chmod 700 .ssh

$ sudo mv mac-rsync-key.pub .ssh/

$ sudo cd .ssh/

# If file "authorized_keys" does not exist, create it
$ sudo touch authorized_keys ; chmod 600 authorized_keys

# Insert content of "mac-rsync-key.pub"
$ sudo cat mac-rsync-key.pub >> authorized_keys

# Now on the host side you can start "rsync" procedure without password question:
# rsync ver. 3.0.7 with ssh
rsync  -h --progress --stats -r -tgo -p -l --update --include=*/attachments/* --include=*/ --exclude=* --prune-empty-dirs  -e "ssh -i mac-rsync-key" rudi@192.168.17.72:/home/rudi/moin-1.8.8/wiki/data/pages /Volumes/hda8/INSTALL/Python/Moin/moin-1.8.8/wiki/data/

-- RudolfReuter 2011-02-13 18:59:52

Have a look at ssh-copy-id

See above. (1)

MoinMoin: WikiSynchronisation