/Documentation /MacroInterface /Plan2008 /Spec /Spec/Formal /Status /ToDo |
Project: Tree based output formatter
This project is part of GoogleSoc2008.
This project adds a new tree based interface between the different parts of the output rendering. This tree can be modified in different ways during the rendering. It gets a distinct mime-type to fit in the following conversions. All the conversions done by this project operates on mime-types as identifier. The type of the input can be anything, e.g. a page with mime-type text/x-moin1.7, an image with image/jpeg or raw data with application/octet-stream. Each converter may supports several input/output mime-types.
The rendering of a wiki page is done in several steps:
Converter text/x-moin1.7 -> application/x-moin-document.
- If there are other types like python source included, it is converted on its own with the appropriate converter and embedded into the tree.
- Macro handling mangles the tree.
- The "Include" macro dumps the tree of another document into the current tree.
- The "TOC" macro creates the table of the complete document and embeds it in the tree.
Converter application/x-moin-document -> application/x-xhtml-moin-page
The same can be used for image/* (and also application/octet-stream):
Converter image/* -> application/x-moin-document
- The generated tree includes an image element which embeds the image into the page but may render further informations like EXIF data.
- Macro handling (may not do anything useful with this tree, but it is generic)
Converter application/x-moin-document -> application/x-xhtml-moin-page
Types
text/x-moin1.7
Wiki source as used in MoinMoin 1.7.
application/x-moin-document
New intermediate tree format.
As any spec, it should reuse appropriate standards. What comes in mind is DC (Dublin Core) for author and similar informations (See [DCMI]). The text part may be a proper subset of OpenDocument (See [ODF]) or DocBook. It will also allow HTML in its own (real) namespace (See [XHTML1.1]).
Includes are done with XInclude and XPointer (See [XInclude] and [XPointer]). It may need a special XPointer function to support anything the current Include macro supports.
application/x-xhtml-moin-page
XHTML subset, no html, head (+ contents) and body. As difference to application/xhtml+xml it specifies div as the root element and can be embedded into the theme to generate the real output. This type is only used internal.
Macro handling
The internal tree will use a little bit different macro definition than the Wiki input. Some macros like BR, Include and TOC will be promoted to pseudo-macros and interpreted (_not_ expanded) by the wiki parser.
Macros need to know the context (block vs. inline) they are used in.
- BR
- It needs to be presented in the tree anyway because it is highly output dependant. HTML implements it as a br-element, ODF as text:line-break.
- Include
- It needs to be handled special because normal macro results should not be again macro expanded. May use XInclude (see [XInclude]).
- TOC
- This is not yet decided, but many output formats support automatic toc generation.
Plugin compatibility
The modifications affect three types of MoinMoin plugins, parser, macro and formatter. parser and macro plugins which only use the public formatter API should work using a special implementation of this API which produces a tree instead of complete output; plugins which directly generate output or even use request.write will not work. Compatibility support for formatter plugins will be not provided.
Macros
- AbandonedPages
See RecentChanges
- Action
- unused?
- AdvancedSearch
- Raw HTML
- Anchor
- Formatter only, unknown
- AttachInfo
- unknown
- AttachList
- unknown
- BR
- Move to parser
- Data, DateTime
- Formatter only
- EditedSystemPages
- Formatter only
- EditTemplates
- unknown
- EmbedObject
- Raw HTML
- FootNote
- Move to parser
- FullSearch
- Raw HTML
- FullSearchCached
See FullSearch
- GetText
- Formatter only
- GetText2
- Formatter only
- GetVal
- Formatter only
- GoTo
- Raw HTML
- Hits
- Raw text
- Icon
- Formatter only
- Include
- Move to parser
- InterWiki
- Formatter only
- LikePages
- Formatter only, recheck
- MailTo
- Formatter only
- MonthCalender
- Raw HTML
- Navigator
- Raw HTML
- NewPage
- Raw HTML
- OrphanedPages
- Formatter only
- PageCount
- Formatter only
- PageHits
- Formatter only
- PageList
- Raw HTML
- PageSize
- Formatter only
- RandomPage
- Formatter only
- RandomQuote
- unknown
- RecentChanges
- Raw HTML
- ShowSmileys
widget.browser.DataBrowserWidget
- StatsChart
- unkown
- SystemAdmin
- Formatter, raw HTML
- SystemInfo
- Raw HTML
- TableOfContents
- Move to parser
- TemplateList
- Formatter only
- TitleIndex
- unknown
- TeudView
- Raw HTML
- TitleSearch
- unknown
- Verbatim
- Formatter only
- WantedPages
- Formatter, raw HTML
- WordIndex
- unknown
Parser
- text/*
- Formatter only
- text/cplusplus
ParserBase based.
- text/creole
- Own tree, formatter
- text/csv
widget.browser.DataBrowserWidget
- text/diff
ParserBase based.
- text/docbook
- Reuses wiki parser.
- text/html
- Raw HTML
- text/irssi
- Formatter
- text/java
ParserBase based
- text/moin-wiki
- Formatter only
- text/pascal
ParserBase based
- text/python
- To be removed (compiles into python code)
- text/rst
- unknown
- text/xslt
- unknown, raw HTML
In-memory tree format
There are two approaches to implement tree structures. One low-level like DOM, which only defines elementar types like text, comment and node. The other one is a high-level tree which includes nodes for paragraphs, links and so on. My intention was to use a low-level tree because I want extensibility. In the discussion I wrote the following:
There is no usual way. DOM uses a low-level set of items (node, attribute, text and some more) which can represent the whole set of inputs, see [DOM]. Encoding the node types into the classes will work if you know all possible inputs or you'll get again a catch-all node.
Let's make an example: I want to include MathML, see [MathML]. MathML is an XML application. There are two ways to do that:
- Use it literally as text. (This IMHO contradicts the reason for this whole project.)
- Parse it and make it part of the tree.
- If using special nodes, you need to either create x node types or have a catch all node which includes the name.
- If using low-level nodes nothing special needs to be done.
[DOM] - http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
Also I think the tree should have a stable "dump" format. XML would be a standardized option. Also someone mentioned that it may be easier to compare the dumps in unit tests instead of direct inspection of the tree.
Or use xpath / other xml tools for the tests.
There are several XML and tree implementations, xml, xml.etree, xml.minidom and lxml.
- xml
- AFAIK unmaintained, libxml as dependency.
- xml.etree (ElementTree)
- Actively maintained, one large API problem: no text nodes.
- xml.minidom
- Old, was never really usable.
- lxml
- libxml as dependency.
Because of this, I think the best solution is a ElementTree fork which fixes the API problem. I don't really like to fork software but anything else would introduce compiled extensions.
Cacheability
The initial tree only depends on the input page. It should be cached directly after the edit. It is also possible to already expand all "stable" (non-volatile) macros at this time. The tree can be converted to HTML in this half-expanded state and cached.
How can we convert to html without fully expanding? E.g. if there is some include and toc macro, this could be a problem IMHO.
The converter to html may be applied several times to the tree and will only touch things it knows but will leave the already existing html intact.
Project stages
Plan for GSOC 2008, Plan for extending this project
Further possible projects
- Section editing: Embed the page source section wise into the tree. This makes it possible to replace one section and dump the source after that.
- Conversion between different wiki markups.
Refs
[DCMI] - Dublin Core Metadata Element Set, Version 1.1: Reference Description, http://www.dublincore.org/documents/dces/, Dublin Core Metadata Initiative, 2003
[ODF] - Open Document Format for Office Applications, Version 1.1, http://docs.oasis-open.org/office/v1.1/, OASIS, 2007
[XHTML1.1] - XHTML 1.1 - Module-based XHTML - Second Edition, http://www.w3.org/TR/xhtml11, W3C, 2007
[XInclude] - XML Inclusions (XInclude) Version 1.0 (Second Edition), http://www.w3.org/TR/xinclude/, 2006