Design
Fourth iteration
Items with Revisions are stored in mercurial's internal directory .hg. Operations on Items are done in memory, utilizing new mercurial features: memchangectx and memfilectx, which allow easy manipulation of changesets without the need of working copy. Advantage is less I/O operations.
Revision data before commit is also stored in memory using StringIO. While this is good for small Items, bigger ones that don't fit into memory will fail.
Revision metadata is stored in mercurial internally, using dictionary bound to each changeset: extra. This gives cleaner code, and mercurial stores this in optimal way itself.
Item Metadata is not versioned and stored in separate directory.
This implementation does not identify Items internally by name. Instead Items have unique ID's which are currently MD5 hashes. Name-to-id translation is stored in name-mapping file. This mapping is text file to allow merging it on repository synchronization. Hashes are computed from Item's name, current timestamp and random integer.
Renames are done by relinking name with hash. Item does not move itself in hg. Thus 'hg rename' is not used, and renames won't be possible 'on console' without providing dedicated hg extensions.
Repository layout
Item Revisions are versioned by mercurial and go to data/rev/.hg/...
Item's last Revision data can appear in data/rev/ after hg update. Stored as file with 'ID' name.
- Item Metadata stored in meta/ as file with 'ID.meta'
name-mapping database in rev/.name-mapping file (versioned just before hg pull/push/backup requests)
data/ +-- rev/ +-- .hg/ +-- .name-mapping +-- 0f4eac723857aa118122c08f534fcf56 (after hg update) +-- 4c4712a4141d261ec0ca8f9037950685 (after hg update) +-- ... +-- meta/ +-- 0f4eac723857aa118122c08f534fcf56.meta +-- 4c4712a4141d261ec0ca8f9037950685.meta +-- ... +-- cache/ +-- 0f4eac723857aa118122c08f534fcf56.cache +-- 4c4712a4141d261ec0ca8f9037950685.cache +-- ...
Prerequisites
This backend uses development version of mercurial (077f1e637cd8 from http://selenic.com/repo/hg). Because there is a feature imposed by API, that current iteration can not overcome, mercurial must pe patched.
This feature is: multiple empty revisions in a row. Currently, Item revisions are stored as file revisions. And to store subsequent empty revisions, we have to force mercurial make filelog records with empty data.
Patch for that behaviour can be found in source tree, MoinMoin/storage/backends/research/repo_force_changes.diff
Limitations
- large attachments won't be commited, max revision data size is min(StringIO buffer size, free system memory): limitation from StringIO/memfilectx
- does not run with out-of-the-box Mercurial, also needs patching
Known broken
GraphInfo can produce nodes with wrong parents so that graph is split into parts
- almost all push/pull stuff is not working
RecentChanges tracebacks after heavy load
Current problems
name mapping has to be consistent after synchronization (low level hg pull) and for backup (hg clone)
patch - troublesome installation, not working as expected when merges occur (revision metadata implentation change and probably multiple empty revisions in a row drop needed to get rid of this patch)
- memory commits - good to avoid working copy (less I/O operations, no data doubling) but disables merge (which relies on working copy)
- no get_revision(-1) optimization (no working copy)
large files that don't fit into memory won't be commited (memfilectx limitation)
This problems lead to antoher iteration planned after SOC.