Supported markup

The MoinMoin markup is almost completely supported by the DocBook formatter. There are some parts which cannot be supported such as <hr>, <big> and <small> which affect rendering but carry no semantic meaning and hence have no counterpart in DocBook.

Page/Article

When rendering the page as DocBook, a DocBook article will be generated. The article will have the name of the page as its title, so to feel like a normal article it should probably not contain underscores.

Inline styling

Inline styling works as expected and the result rendering is close the the rendering on the html page. There are two exceptions: yelp doesn't support <emphasis role="strikethrough"> and if you nest two different renderings of emphasis (such as something that bolds and something that makes it italic) you will not get them both (bold italic), but instead the inner will overrule the outer.

Supported are italic and bold and monospace and source code and subscript and underline and superscript.

Supported in the formatter but not really in yelp strike through bold italics

As mentioned smaller and larger are not supported

Headings/Sections

Each heading in the wiki markup will prompt the creation of a section-element with the headin as its title. Subheadings will be created as subsections and there is no limit to the level of nesting. The main thing to remember about the headings is consistancy within a page. The page should not start with a level three heading and have a level one heading follow it.

Paragraphs

The wiki parser automatically creates paragraphs when a pieces of text are separated by empty lines. In some cases it will gladly nest paragraphs or omit them, which are cases that should be transparently handled by the DocBook formatter. In general it is safer to use an empty line between text and block items such as tables, instead of relying on the wikiparser paragraph logic. Linebreaks in paragraphs are not possible in DocBook but are possible in HTML. This case is handled by replacing the linebreak by starting a new paragraph.

Lists of all sorts

The list handling should be quite robust, and no special considerations should be needed. Bulleted lists, numbered lists and even bulletless lists are all supported and nesting of different types of lists should work as well.

Glossary lists with glossary terms and definitions are converted from moinmoin's (html) definition lists. This should also be quite robust and even strange cases such as

Red::
 :: Color
 :: Wine

should give a semantically valid result (one entry with one term which has multiple definitions):

Red::

Color
Wine

All urls in the page will be converted to absolute urls for the docbook. This is true for both image sources as well as links. Images are embedded as imageobject in a mediaobject container together with their alternative text (which will be in a textobject container). Wheter the image is an attachment or external should not have any visible inpact on the resutling docbook. The same goes for all links, those inside the wiki as well as thoes that point to external sites.

The support is pretty solid and for example having an image as a clickable link works, and the images alternative text is displayed as the tooltip for the link.

Code areas

A code area is converted to either a screen element or programlisting, depending on if a programming language was defined in the wiki markup. Line numbering is supported natively by DocBook so if linenumbering is requested the relevant attribute will be set. The different parsed tokens are mapped to their DocBook counterparts, although the mapping is not necessarily very good. The yelp stylesheet doesn't seem to do any syntax coloring.

Macros

Well behaved simple macros should work directly. The formatter works around the strange behaviour of the Include macro, so the include macro should work for many cases. The FootNote macro is intersepted and will yield a real native DocBook footnote.

Table support

Table support has been written from scratch and is robust. Cells that span multiple rows or columns or both are supported. Background colors cannot be supported, but all the different ways for marking cell dimensions are supported. Percentual widths are supported by the formatter, but not by yelp which will simply ignore them.

Entities and codepoints

Currently all entities such as &rarr; and all unicode entities such as &x9023; are converted to utf-8 characters. This will give the right result, but editing the produced document will require a utf-8 aware editor.

The code could be changed so that all entities are converted to unicode entities, which should work just as well, but not require utf-8 support on the other end.

Admonitions

The five admonitions from DocBook are supported (tip, important, note, caution, warning) by using the syntax:

{{{#!wiki important
'''The title of the admonition'''

Some text
}}}

Technical information

The mapping between the wiki and DocBook xml works as follows: as the wiki-parser reads the markup, it calls methods of a "formatter"-module. The methods are for example "paragraph" and "heading" which are called with a parameter set to one when the paragraph starts, and a zero when it ends.. It is the responsibility of the formatter module to output the suitable markup for each of the functions. Unfortunately the parser is pretty straightly tied to html, and doesn't always strictly remember to inform that one element should end.

In the docbook formatters case it builds a tree structure in memory, insted of outputing tags. This means that it will not output anything when a method is called, but simply update its internal document-tree. The exception is of course when endDocument is called, at which point it calls the xml-trees serialize method, which writes the whole tree in to a string, which the formatter then passes on.

This approach has several advantages as well as a couple of disadvantages. The advantages are that it is easier to do complex things which require memory, such as figuring out the number of columns and rows a complex table has or removing empty nodes. Another advantage is that the resulting document will always be completely valid xml although it might not validate against the dtd. The only drawback is that some macros assume that calling a formatter function returns the text to be outputted, and that they can call the methods in the wrong order as long as they concatenate the return values in the right order. This will of course not yield the right result with the docbook formatter.

MoinMoin: RenderAsDocBook (last edited 2008-03-13 10:34:40 by MikkoVirkkilä)