See also: ParserMarket, look for the MediaWiki parser (ParserMarket/Media4Moin)
MediaWiki to MoinMoin Wiki converter
It is a php page that query current versions of a MediaWiki pages from database and store it to directory structure compatible with MoinMoin wiki. Scroll to the end for a very alpha python version.
To use - download one of existent here attachments, write into them MediaWiki database login info, and then just run it.
ToDo
- checking if directories exist
ChangeLog
Branched 0.3
I had some problems with the 0.2 script but built on it for a better version. The script creates a folder called mediawiki_pages and inserts the folders in there. My code is shoddy but should work better than 0.2.
The Code
Added Features
- Made the script more selective about what pages it converts (via select statement)
- Added Heading Conversion handling
- Added Numbered list handling
Added "<br/>" to [[BR]]handling
- Fixed crash due to parsing of links
0.2
0.1
- creates the necessary directories and writes a revision 00000001
Code v0.1
Quote directory name
1 UNSAFE = re.compile(r'[^a-zA-Z0-9_]+')
2
3 wikiname = wikiname.replace(u' ', u'_') # " " -> "_"
4 filename = wikiname.encode(charset)
5
6 quoted = []
7 location = 0
8 for needle in UNSAFE.finditer(filename):
9 # append leading safe stuff
10 quoted.append(filename[location:needle.start()])
11 location = needle.end()
12 # Quote and append unsafe stuff
13 quoted.append('(')
14 for character in needle.group():
15 quoted.append('%02x' % ord(character))
16 quoted.append(')')
17
18 # append rest of string
19 quoted.append(filename[location:])
20 return ''.join(quoted)
$pagename = "öüä"; $pagename = utf8_encode(str_replace(" ", "_", $pagename)); $quoted = array(); $in_parenthesis = 0; for($i = 0; $i < strlen($pagename); $i++) { $curchar = substr ($pagename, $i, 1); if (ereg('[^a-zA-Z0-9_]', $curchar)) { if (!$in_parenthesis) $quoted[] = '(' $in_parenthesis = 1; $quoted[] = str_pad(dechex(ord($curchar)), 2, '0', STR_PAD_LEFT); } else { if ($in_parenthesis) $quoted[] = )'; $quoted[] = $curchar; } } $pagename_new = implode('', $quoted);
Possible Update
Code can be found at mw2moin.php.txt or the mirror.
After a bit of hacking I refactored the code and added the following features
- Imports based on namespace (so not to blow away your regular pages by talk pages and gets rid of that nasty array)
Talk pages can be imported based on Page/<defined name> (probably talk)
Added the parser header for use with the parser from ParserMarket
- Added import of images (don't know how to handle the actual image pages yet probably should be done on the parser side)
- Added revision history with timestamp and sorted sitewide edit-log
The only side effect I've observed is on large sites that you've set to import revision history and change the syntax at the same time you'll likely hit your php memory limit, but all the code needs some tidying up along those lines. I've also added some header comments to clarify just what exactly is going on, and explaining a bit about configuration. The only other thing I could think of to add was a routine to import users, but the version of MediaWiki I had authed against md5 while MoinMoin prefixed with {SHA}. I thought about a couple options:
- Import users with new random passwords and force them to mail password and reset
Make a patch for MoinMoin to detect user encryption based on {KEY} and possibly have a forced password upgrade option.
My intent was to create a more featureful converter so as to hopefully inspire others to make more featureful parsers (if only we had table support and image thumbnails) and the like to aid large sites in migration.
Another possible update
The versions above don't work with MediaWiki versions newer than 0.4 due to the changes in database layout (the 'cur' table is gone). Here is a version that seems to work with 0.11 and MoinMoin 0.5: mw_11_2_moin.php.txt
Python converter
What we really need is a converter from current mediawiki to current moin.
To be of any long term usefulness, the converter should have some specific properties:
must be in Python (MoinMoin developers won't touch PHP or Perl code)
- clean and OO (no ugly quickhack)
- input: maybe the mediawiki xml export format
- conversion:
- input: best would be a parser for mw that creates a dom tree from mw markup
- output: convert dom tree to moin wiki format
- output: moin wiki xmlrpc
Before starting:
- moin has no namespaces, how do we handle that?
Here's mine
I converted the latest PHP script above to Python, and removed some logic that was duplicated from Moin replacing it with calls to Moin functions. Also it supports multiple database backends in principle, although I only wrote a sqlite backend. It has all of 7 lines, so it should be trivial to write others. Please email them to me or publish them in a branch if you do. (Or if you make fixes/improvements. Or write docs.)
The calls into the Moin API are written for Moin 1.5, because that's the latest version where the MediaWiki parser works. If you want to migrate the parser to 1.8/1.9, I'd be happy to upgrade this converter too.
There's no documentation in the sources yet, not even a license file or a README. Just edit the config (ideally, copy defaultconfig.py to localconfig.py and edit that one) then invoke the script.
Nice to see some python code. AFAIK, currently noone works on updating that parser. It depends a bit on the effort needed whether it makes sense to update it for 1.9 as we already work on moin 2.0 and there everything is rather different (btw, we want a mediawiki converter there!). -- ThomasWaldmann 2010-02-15 21:24:53
Another take
I wrote a quick and dirty MediaWiki-to-Moin converter too, this time one that supports PostgreSQL. This is pretty crucial, because MW's PostgreSQL backend actually names the tables differently than the MySQL backend. This script's behaviour can be configured on per-namespace basis; for example, you can tell it to dump Talk: pages to "%s/Discussion" (%s gets replaced by page title) and MyFunnyCustomNamespace: pages to "My Funny Custom Namespace/%s". It can also use a custom mapping of MediaWiki users to Moin users. It preserves history, but makes no attempt at understanding the markup and just tags pages as "#format mediawiki". I originally intended it to use the MoinMoin API, but it actually just dumps stuff into a directory in storage format; you can just copy the directories over to data/pages. It attempts to produce a global edit log, but it's not sorted. Also, I'm not much of a Python guy so this is pretty ugly. =) -- UrpoLankinen 2011-03-22 10:20:15