Serialize to XML / Deserialize from XML (moin 2.x)
We need (de-)serialization at quite some places:
- (complete) wiki backup / restore: data and user items
- import/export of single items or sets of items (see page packages in moin 1.x)
- system / help page packs (+ i18ned versions)
- wikisync for inter-wiki transfer of item revisions
- wiki xmlrpc (v3?)
- a format usable to write importers / exporters for other wiki engines (markup conversion shall not be part of this feature request)
Ideas for serialization format
Complete dump
<wiki> <meta> ... (wiki-level metadata) ... </meta> <items> ... (items) ... </items> </wiki>
Item serialization (1 complete item)
<item type="user|data" name="..."> <meta> ... (item-level metadata) ... </meta> <revision revno="0"> <meta> ... (revision-level metadata) ... </meta> <data> ... (revision data) ... </data> </revision> ... (more revisions) ... </item>
Metadata representation
<meta> <entry name="key1">value1</entry> <entry name="key2">value2</entry> ... </meta>
Data representation and dealing with large data
As we store arbitrary binary items, stuff can easily get rather big (e.g. if someone attaches a CD or DVD image, maybe even in multiple revisions).
Embedding chunked revision data
ElementTree (and also EmeraldTree) does not seem to support data streaming (i.e. we can't read item data chunk-wise and write it to a single xml element).
But we can create multiple data-chunk elements, like:
<data> <chunk>... 100kB data ...</chunk> <chunk>... 100kB data ...</chunk> <chunk>... 42kB data (== the rest)...</chunk> </data>
The data inside the chunk could be base64-encoded to support all item mimetypes in the same way.
Referring to revision data
- put href: reference to data revision into xml dump (source wiki needs to be reachable to fetch the data later)
- put file: reference to data revision into xml dump, write revision data to separate files (no access to src wiki needed)
Integrity checking
Data integrity can be checked using the hash we have in the metadata.
Serializer features
- complete wiki (data and users)
- sets of (named) items
- with all revisions
- just latest revision
- with some revisions (since timestamp? all revision hashes we do not have yet?)
- revisions need to be serialized in-order (0,1,2,...)
- in any case: per item
- globally in order of timestamp (see "backend.history()" output)?