Current State : WIP

DocBook to DOM Conversion

Contents

DocBook to DOM Conversion

Equivalences

Check DOM DocBook and HTML 2010/DocBook-DOM Equivalences

About this converter

The idea is to display all the information, with some basic style to be able to make difference between the different elements. But we will naturally loose information when we will convert from DocBook to the DOM Tree since the last one cannot have any semantic meaning.

Exceptions

At this time the converter handle correctly two kind of error in your DocBook document :

Invalid XML Document: If the xml parser could not parse correctly your document you will see this corresponding error. Usually this happen whet you forget to close a tag, or you did a typo mistake in the name of a tag.
NameSpaceError: If you did not put the correct namespace declaration (xmlns='http://docbook.org/ns/docbook') for your DocBook document you will get this error. Usually this happen when you forget to put xmlns='http://docbook.org/ns/docbook' in the attribute of the root element. Or if you did a mistake in the address of the namespace.

At this time the DocBook_IN converter is not so rigorous with the DocBook specification, you won't get any indication that your document is valid according to the DocBook document, even if the converter could convert it into our internal DOM Tree. However for a better result, the converter expect valid DocBook 5 document.

Status

Current State : WIP (Early stage)

Page Structure

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<article>	<page> or <div>	Done	Done	Need to check if <page> can contain other <page>
<simpara>	<p>	Done	Done
<para>	<p>	Done	Done
<formalpara>	<p title=>	Done	Done

Sections

The section in DocBook defines part between headings. There is two system for the sections : recursive and numbered section. First one only use <section> tag and let the processor deal with the level. And the second one explicitly define a level from 1 to 5 with tag like <sectN>.

The two system are mutually exclusive, so you you cannot have numbered section inside recursive one, and vice et versa. However, the two system can be used in the same document if the previous rule has been respected.

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<sectX>	<h outline-level=X> if <title>	Done	Done
<section>	<h outline-level=X> if <title>	Done	Done

Lists

Here is a DocBook file with the example of each list : List.xml

And here is the pdf resulting after a conversion using two classical tool : xslt and fop : List.pdf

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<itemizedlist>	<list item-label-generate='unordered'>	Done	Done
<orderedlist>	<list item-label-generate='ordered'>	Done	Done
<variablelist>	<list>	Done	Done
<varlistentry>	<list-item>	Done	Done
<term>	<list-item-label>	Done	Done
<glosslist>	<list>	Done	Done
<glossentry>	<list-item>	Done	Done
<glossterm>	<list-item-label>	Done	Done
<glossdef>	<list-item-body>	Done	Done
<procedure>	<list item-label-generate='ordered'>	Done	Done	$/!\$ The official XSL stylesheet add a Procedure title, but we don't
<step>	<list-item><list-item-body>	Done	Done
<stepalternatives>	<list-item><list-item-body>	Done	Done
<substep>	<list item-label-generate='ordered'>	Done	Done
<qandaset>	<list>	Done	Done	See QandAset.
<question>	quite complex, need translation	Done	Done	See QandAset.
<answer>	quite complex, need translation	Done	Done	See QandAset.
<segmentedlist>	<list>	Done	Done	Create variable list with pre-defined label.
<segtitle>	Just save it	Done	Done
<seglistitem>	<list-item>	Done	Done
<seg>	Saved label+<list-item-body>	Done	Done
<simplelist>	<list item-label-generate='unordered'>	WIP	WIP	Do not support `type` attribute yet. It also use bullet for the rendering.
<member>	<list-item><list-item-body>	Done	Done
<listitem>	<list-item><list-item-body>	WIP	WIP	Different conversion depending the parent list.

QandA set

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<quandadiv>	<list>	Nothing	Nothing
`defaultlabel='number'	<list item-label-generate='unordered'>	Done	Done
<question><answer>	<list-item><list-item-body><p>Q</p><p>A</p></></>	Done	Done
`defaultlabel='qanda'	<list>	Done	Done
<question>	<list-item><list-item-label>Q:</><list-item-body>Q Body</></>	Done	Done
<answer>	<list-item><list-item-label>A:</><list-item-body>A Body</></>	Done	Done

Tables

There is two kind of table in DocBook : db.html.table and db.cals.table.

Html.Table

Since db.html.table are same that the usual table in HTML, the code, test and equivalence are globally similar to the code from the HTML_IN converter.

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<informtable>	<table>	Done	WIP	$/!\$ see title
<table>	<table>	Done	WIP	$/!\$ see title
<theader>	<table-header>	Done	Done
<tfoot>	<table-footer>	Done	Done
<tbody>	<table-body>	Done	Done
<tr>	<table-row>	Done	Done
<td>	<table-cell>	Done	Done
<th>	<table-cell>	Done	Done
<col>	Save attribute to put it on the col	Nothing	Nothing
<colsepc>	Save attribute to put it on the col	Nothing	Nothing

db.cals.table

The converter does not support rowspan and colspan with db.cals.table.

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<informtable>	<table>	Done	WIP	$/!\$ see title
<table>	<table>	Done	WIP	$/!\$ see title
<row>	<table-row>	Done	Done
<entry>	<table-cell>	Done	Done
<entrytbl>	<table-cell><table>	Done	Done
<colgroup>	Save attribute to put it on the col	Nothing	Nothing

Misc

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<footnote>	<note note-class="footnote">	Done	Done
<quote>	<quote>	Done	Done
<blockquote>	<blockquote>	Done	Done	$/!\$ attribution element converted to source.
<attribution>	`source` attribute of blockquote	Done	Done	See blockquote below
<trademark>	<span element=trademark>	Done	Done	$/!\$ add a trademark at the end
<sbr>	<line-break>	Done	Done
<email>	Add the corresponding macro in the DOM	Nothing	Nothing
<tag namespace class>	<span class="db-tag-class">{namespace}tag	Done	Done

Links

Actually there is three different kind of links in the DocBook reference. However all use xlink namespace for their attribute except for the linkend attribute which define a link within the document.

So it is pretty simple to convert these link into the DOM tree, since it is also using xlink namespace for the link.

However, I propose also to handle conversion of the old way for the link from DocBook 4.X even if the converter does not support directly DocBook 4.X. Indeed, many application still use <ulink url=url> for the links. Especially the Moin1.X DocBook formatter output link like this. So it can be interesting to keep that.

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<link `xlink attr`>	<a `xlink attr`>	Done	Done
<link `linkend`>	See anchor support	Done	Done
<ulink url="url:test">	<a xlink:href="url:test">	Done	Done	DocBook v4 link
<olink `targetdoc` `targetptr`>	<xlink:href="targetdoc#targetptr">	Done	Done

$/!\$ endterm attribute is not supported for any of this elements.

Object

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<inlinemediadata>	<span element="inlinemediadata">	Done	Done
<mediadata>	<div html:class="mediadata">	Done	Done
<audioobject>	See *Object conversion	Done	Done
<imageobject>	See *Object conversion	Done	Done
<textobject>	See *Object conversion	Done	Done
<videoobject>	See *Object conversion	Done	Done
<imagedata>	<object type='image/'>	Done	Done
<audiodata>	<object type='audio/'>	Done	Done
<videodata>	<object type='video/'>	Done	Done

Code

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<screen>	<blockcode>	Done	Done	Need to see if the DOM Tree support `linenumbering`, `language`
<programlisting>	<blockcode>	Done	Done	Need to see if the DOM Tree support `linenumbering`, `language`
<literal>	<code>	Done	Done
<literallayout>	<blockcode html:class="db-literallayout">	Done	Done
<code>	<code>	Done	Done	$/!\$ Check language attribute
<computeroutput>	<code>	Done	Done
<markup>	<code>	Done	Done

Style Elements

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<emphasis>	<emphasis>	Done	Done
<emphasis role="strong">	<strong>	Done	Done
<phrase>	<span>	Done	Done
<subscript>	<span baseline-shift>	Done	Done
<superscript>	<span baseline-shift>	Done	Done

Admonitions

DocBook element	Moin_page equivalence	Test	Conversion	Comments
<caution>	<admonition type='caution'>	Done	Done
<important>	<admonition type='important'>	Done	Done
<note>	<admonition type='note'>	Done	Done
<tip>	<admonition type='tip'>	Done	Done
<warning>	<admonition type='warning'>	Done	Done

Standard Attribute

xml:base

Ignored tags

The following tags are completely ignored by the converter, so even the children of these elements will not be handled.

['abstract', 'artpagenums', 'annotation', 'artpagenums', 'author', 'authorgroup',
'authorinitials', 'bibliocoverage', 'biblioid','bibliomisc', 'bibliomset', 'bibliorelation',
'biblioset', 'bibliosource', 'collab', 'confdates', 'confgroup', 'confnum', 'confsponsor',
'conftitle', 'contractnum', 'contractsponsor', 'contrib', 'copyright',
'cover', 'edition', 'editor','extendedlink', 'issuenum', 'itermset', 'keyword',
'keywordset', 'legalnotice', 'org', 'orgname', 'orgdiv', 'otheraddr', 'othercredit', 'pagenums', 'personblurb', 'printhistory',
'productname', 'productnumber', 'pubdate','publisher', 'publishername', 'releaseinfo', 'titleabbrev',
'revhistory', 'seriesvolnums','subjectset', 'volumenum', 'bibliodiv', 'biblioentry', 'bibliography',
'bibliolist', 'bibliomixed', 'biblioref', 'bibliorelation','citation', 'callout', 'calloutlist','co', 'imageobjectco', 'area',
'areaset','areaspec', 'classname', 'classsynopsis', 'classsynopsisinfo', 'constructorsynopsis', 'fieldsynopsis',
'funcdef', 'funcparams', 'funcprototype', 'funcsynopsis', 'funcsynopsisinfo', 'function', 'group', 'initializer'
'interfacename', 'methodname', 'methodparam', 'methodsynopsis', 'ooclass', 'ooexception', 'oointerface', 'varargs', 'void', 
'guibutton', 'guiicon', 'guilabel', 'guimenu', 'guimenuitem', 'guisubmenu',
'info', 'bridghead', 'constraint', 'constraintdef', 'lhs', 'nonterminal', 'rhs',
'msg, 'msgaud', 'msgentry', 'msgexplan', 'msginfo', 'msglevel', 'msgmain', 'msgorig', 'msgrel', 'msgset', 'msgsub', 'msgtext',
'refclass', 'refdescriptor', 'refentry', 'refentrytitle', 'reference', 'refmeta', 'refmiscinfo', 'refname', 'refnamediv',
'refpurpose', 'refsect1', 'refsect2', 'refsect3', 'refsection', 'refsynopsisdiv',
'toc', 'tocdiv', 'tocentry', 'arc', 'spanspec', 'xref',
'index', 'indexdiv', 'indexentry', 'indexterm',
'primary', 'primaryie', 'secondary', 'secondaryie', 'see', 'seealso',
'tertiary', 'tertiaryie' ]

Actually, the ignored tags are mainly the "info" and the bibliography elements. For the info elements the DocBook documentation indicates the following :

Processing expectations

Suppressed. Many of the elements in this wrapper may be used in presentation, but they are not generally printed as part of the formatting of the wrapper. The wrapper merely serves to identify where they occur.

So we are not processing the "info" elements. Later, we can imagine a metadata processor for MoinMoin, which would extract such of data.

For the bibliography, we do not support it either, since it would be useful only if we could support a full environment to handle bibliography. We can also imagine that bibliography support can be add to MoinMoin later.

Inline Elements not handled

$/!\$ WIP $/!\$

The following list of elements are just handle using a <span element="element-name">.

abbrev
address
accel
acronym
affiliation
alt
anchor
city
command
constant
country
database
date
errorcode
errorname
errortext
errortype
exceptionname
fax
filename
firstname
firstterm
foreignphrase
hardware
holder
honorific
jobtitle
keycap
keycode
keycombo
keysym
lineannotation
manvolnum
mousebutton
option
optional
package
person
personname
phone
pob
postcode
prompt
remark
replaceable
returnvalue
shortaffil
shortcut
state
street
surname
symbol
systemitem
termdef
type
uri
userinput
varname
wordasword

Block Element not handled in DOM Tree

Some elements does not have direct equivalence in our DOM Tree, but to keep the meaning we convert the following tags using <div class="db_tag.name">

It also check if there is a title as a first child element, if so, we add this to the html:title attribute of the <div> element.

acknowledgements
appendix
caption
chapter
cmdsynopsis
colophon
dedication
epigraph
example
figure
equation
part
partintro
screenshoot
set
setindex
sidebar
simplesect
subtitle
synopsis
synopfragment
task
taskprerequisites
taskrelated
tasksummary
title $/!\$ or <h> if child of a section.

$/!\$ Informal*

$/!\$ InlineEquation

ToDo

See if we should care of the DocType or not.
See if we can support DocBook v.4.0 within DocBook v.5

MoinMoin: DOM DocBook and HTML 2010/DocBook-DOM (last edited 2010-08-10 19:25:46 by ValentinJaniaut)