Contents
Motivation
Generating valid HTML output has proven to be difficult. Because XML is much less forgiving than a browser it might be a good idea to do the harder job first.
- "Hard work now leads to less work FULLSTOP" -- Alan Cox
Wiki XML
The exact format of Wiki XML is still open. It would be nice to have a WikiDtd and as an extentions the MoinMoinDtd. See also: WikiXml
DOM tree vs tags
Because we want to reuse the result, the implementation should not be too DOM centric. There is an obvious analogy between generating a text/xml document and build a DOM tree. To make this more clear:
XML document |
DOM operation |
process line by line |
depth first traversal |
open tag |
add node as last child at "current" position node |
close tag |
move "current" one node up |
With implementation of .new_tag() .close_tag() it should be easy to reuse the code to produce text/xml or text/html.
Ideas/Problems
- Keep stack of all open tags as state (don't use the DOM tree directly for this)
<strong>text<h1>text</strong>text</h1> must be converted to <strong>text</strong><h1><strong>text</strong>text</h1>
- differ between
- "breakable" tags (may be closed and reopened again)
<strong>, <em>, <highlight>
<sub>, <sup>, <code>, <underlined>
- "unbreakable" tags (reopen creates a new entity we don't want)
<hN>
<table>, <td>, <tr>
<ol>, <ul>, <li>
<p>, <pre>
<a>
- "breakable" tags (may be closed and reopened again)
- unbreakable tags must be opened and closed with correct nesting by the parser (enforce this by formatter)
- breakable tags must be closed and reopened if needed by the formatter
- differ between
<p> needs special treatment
- it must be opened and closed automatically if needed
- parser should only need to say: new paragraph, please.
Opening a tag
Unbreakable tag
- ensure that tag is not already open if not allowed (raise error)
- collect all breakable tags (close and remember)
close some unbreakable tags (<p>) ????
- check if enclosing tag is correct (????)
- insert tag
- reopen collected tags
breakable tag
- ensure that tag is not already open (raise error)
- insert tag
Closing tag
- collect breakable tags
perhaps close some open unbreakable tags (</table> closes <td> and <tr>)
- close tag
- reinsert collected tags
comments
I do not see the problem with opening and closing tags. Nobody should want to write a new parser. I think there a re a lot of tools that can be embedded that know how to handle. If we use a DTD or a schema all should work by definition. The problem is, is tags are written by a new parser that can not read DTDs or Schemas. -- ThiloPfennig 2005-12-18 10:03:33