Things to do, and the people doing them

Contents

Things to do, and the people doing them

Determine wiki structure from export XML

Who is working on this: BradleyDean, PaulBoddie

Given the XML structure in the Confluence exports, extract the site structure (including pages, attachments and history).

I've written some experimental code to export page revisions and manifests from the XML dump (convert.py), along with a module (parser.py) that performs some simple parsing of page text given to it on standard input. The idea is to combine the manifests and give them to the package installer in order to import the Wiki content into Moin, but only after the actual page revisions have been parsed and converted to Moin syntax. -- PaulBoddie 2012-04-01 22:45:46

I forgot to include the xmlread module, but I'll upload that later today. -- PaulBoddie 2012-04-02 08:09:41
The missing module is now available here. You can just copy xmlread.py into the ConfluenceConverter distribution and it should work. -- PaulBoddie 2012-04-02 16:08:15

What confluence markup is being used?

Who is working on this: PaulBoddie

So we know what work needs to be done, find out what subset of the confluence markup is being used in the mailman wiki.

The current strategy is to just target basic markup and to try and identify macros in use. With Confluence 3 markup, this involves searching for things resembling {...} - see the get_macros.py file. With Confluence 4 XHTML markup, the exercise is simplified somewhat by looking for element usage. -- PaulBoddie 2012-12-20 22:47:54

Parse confluence markup into DOM/AST-structure

Who is working on this: PaulBoddie

NOTE: The DOM/AST structure will need to be agreed upon between this and the moinmoin output step

Given raw confluence markup (just the page content, extracted from the XML structure) parse the data and store in some sort of DOM/AST style form.

Currently, this is just writing Moin markup out while traversing both Confluence 3 and 4 markup. There may well be opportunities to consolidate some of the output formatting, but a tree representation isn't yet in use. -- PaulBoddie 2012-12-20 22:47:54

MoinMoin output from parsed data

Who is working on this: PaulBoddie

NOTE: The DOM/AST structure will need to be agreed upon between this and the parsing step

Given the parsed content, generate raw MoinMoin markup

See above. If we can get away with just invoking common formatting functions and not needing to generate a tree, we'll just stick with doing the former. -- PaulBoddie 2012-12-20 22:47:54

Notes from MoinMoin devs

If you are going the DOM way (parsing stuff into a DOM tree, generating moin markup from that DOM tree), you should use the same DOM tree as moin2 does. There's a moinwiki_out converter already for that DOM tree, so you save half of the work.

MoinMoin: ConfluenceConverter/DevelopmentNotes/TaskList (last edited 2012-12-20 22:47:54 by PaulBoddie)