Design ideas
What and why didn't work
Changelog, Filelog and empty commits
Even If you're not familiar with mercurial internals, you probably know what changelog is. Changelog contains information about each changeset. Each revision records who committed a change, the changeset comment, and several other pieces of changeset-related information.
Any wiki page (item) change is commited as changeset containing one file representing that particular page in backend repository. If we want to retrieve that page (item) history, we have to traverse changelog to find changesets containing that item. This is rather very expensive operation! In such item-centric aproach it wolud be much easier and more efficient to iterate on item's filelog.
Filelog is a place, where all file modifications are tracked. Each entry in the filelog contains enough information to reconstruct one revision of the file that is being tracked. Filelogs are stored as files in the .hg/store/data directory. A filelog contains two kinds of information: revision data, and an index to help Mercurial to find a revision efficiently.
While at this point everything is pretty straightforward and easy to implement, there is some serious problem to overcome. Storage API defines two types of item changes which may trigger new revision creation:
- data change
- metadata change
It not necessary at this point to know how exactly metadata store is implemented in mercurial backend. Item metadata is not stored within data (for various reasons, but mainly becasue item data may not be text).
If the data changed, mercurial will politely insert new entry to item's filelog on commit. And later we can retrive that entry along with previous revisions of that item. If the metadata changed and there is no data change in pending revision - we're boned. Mercurial doesn't allow commits with no data change. Even if we try to force such commit (force=True in commit params) all we get is entry in changelog. Filelog isn't touched at all. Therefore we can't rely on filelog as valid source of wiki item versions information.
Of course, one may say solution is to patch mercurial. In fact patch is written and is quite small but won't get into official hg release. And if it is not official, it violates one of design goals: Compatible with wide-spread (stable, vanilla) version of mercurial.
Idea fail.
Filelog and metadata
Trying to find solution for revision metadata store I came up against this post about revlog metadata in mercurial. It turns out that mercurial provides a mechanism for storing file metadata in filelog entries, and strips that metadata information when extracting data to wokring directory. Currently the only metadata stored there is used for rename marking.
One can get this information using: filelog._readmeta(self, node). That's right, there is no _writemeta available. This metadata can be written only from commit method body. There is no way to pass own set of params. What's more interesting - even if no data is changed but meta is set, new entry in filelog is created on commit. How sweet would it be to define own metadata!
Some time ago I prepared this patch for writing metadata. Might be sligthly inaccurate now, nevertheless shows the idea. Once again, it violates one of design goals: Compatible with wide-spread (stable, vanilla) version of mercurial.
Idea fail.
Memory commits
Memory commits is the feature used by convert extension. It allows doing mercurial commits based on data stored in memory, rather than from file in working copy. Seems it may give some boost:
- code no longer has to deal with file locking in working copy
- there are less I/O operations (no working copy)
- no data doubling (in .hg and working copy)
On the other side size of commited file is limited to available system memory. While we gain less I/O operations, we lose get_revision(-1) optimization. Basically get_revision(-1) gets last revision of item - data for this revision can be found in working directory without extracting it from .hg. I assume this is rather important speedup.
Finally, mercurial merge operation relies on working copy so memory commits are no-go.