Observations on Mercurial internals

changeset.extra() dictionary

Every changeset has special dictionary named extra. This is to store things such as internal branch name ("branch": "default"). When committing a new changeset, items from this dictionary get through "\0".join(). So that if you store there your own values, you always get string on retrieval.

Encoding file names

Mercurial uses its own algorithm for escaping file names in order to store them safely on different filesystems. When playing with your filesystem name length limits, the safest approach is to use at most half of possible name length - 2 (file extension). All because mercurial's escaping special chars to __ and capital letters like A to _a, which doubles space used for character. Repository stores file's data in escaped_filename.d, thus beware reaching filesystem limits. Pre-escaping names (before adding to mercurial) using this algorithm is also risky - you get all underscore characters escaped once again.

HGENCODING

The commit message and author are decoded from current locale's encoding to utf-8 on commit. This is trouble if you use utf-8 internally yourself, but the user's locale are set to something else. You can override it by setting the environment variable HGENCODING, but it has to be set before any mercurial modules are imported, because it's checked in the "top level" code of mercurial.util.

   1 import os
   2 os.environ['HGENCODING'] = 'utf-8' 
   3 from mercurial import ...

MoinMoin: PawelPacana/MercurialBackend/MercurialNotes (last edited 2010-04-27 11:15:55 by gwnl)