Idea for FileSystemStorage implementation, simplify the directory structure and the code handling pages and logs

(!) pages as bundles (without sub pages) is implemented in moin--main--1.3--patch-101..104

See also StorageRefactoring, LogfileRefactoring

What is a Bundle

In Mac OS X (actually it came from NEXT) applications and documents come in a Bundle. A Bundle is a directory with certain structure which the end user see as regular file. For example, application can be distributed as 50MB bundle containing the executable code, resources, help files and libraries and meta data - all in what looks like a regular file you can drag to your applications folder and double click.

How is this related to moin?

Instead of holding text, backup and pages directory, each one holding huge amount of files that have to be synchronized, we can hold each wiki page - and sub pages as simple directory structure, holding both the text, the backup versions and the attachments of a single page or a tree of sub pages.

moin 1.3 pagebundle layout

PageName
    edit-lock
    edit-log
    current (contains 00000003)
    revisions/
        00000001 (revision number)
        00000002
        00000003
    cache/
        pagelinks
        text_html
        hitcounts
    attachments/
        attached_file.txt
    pages/
        SubPage (not used yet, we store sub pages ''flat'')
            (same structure as above repeats here)

page path using this system: data/pages/ParentPage/pages/subPage/pages/SubSubPage

Each page is totally independent from its parent, you can copy it to another parent page or drop it in another wiki, it's just like a single file.

The virtual rootpage and doing SubPages by subdirectories

We can do SubPages by using subdirectories - or not, that is our choice. See HierarchicalWiki on why we should (or should not) and how we could use it (or rather not).

If we do, we maybe need some caching for easier getting complete pagelists.

The first implementation done in moin 1.3- still uses a flat namespace, so subpages are directories on same level as their parent pages.

But even looking at that still flat implementation, we have already kind of pages by subdirectories: all pages are subpages of a virtual root page living in data/.

A page MainPage has MainPage/pages/ to hold its subpages.

If you look at data_dir directory, you also have a pages directory data/pages/ holding all your wiki's pages. So one can think of data/ being a virtual root page and all wiki pages being subpages of that. So the root page (and finally any page) provides a getPageList() function listing its subpages.

And if we think of all (currently) global wiki functions as functions of object rootpage, we could also have subwikis if we apply those functions to other pages having subpages, just by giving rootpage=OtherPage to the appropriate functions.

When doing subpages by subdirectories, edit-log should include subpage changes, so root objects changelog would be recentchanges, but we could also do local recentchanges on a page with subpages.

/!\ Please note that the "subpage by subdirectory" code was removed in moin 1.6 to make StorageRefactoring easier.

1.4 itembundle layout

ItemName
    meta
    cache/
    revisions/
        00000001
        00000002
        ...
    revisions_meta/
        00000001
        00000002
        ...
    sub/
        SubItemName/
            (same structure repeated)

Meta file

At bundle level, meta contains some meta infos about the wiki, page, file stored here:

name: ItemNameInUtf8Encoding
current: 1
bundle-version: 1

revisions_meta/XXXXXXXX contains some meta infos about the page/file/item stored in the revisions/XXXXXXXX file:

name: ItemNameInUtf8Encoding
Content-Type: text/X-moin-wiki;charset=utf-8 (or application/octet-stream or similar)
acl: WikiAdmin:read,write,delete,revert,admin All:
action: SAVE
editor_user: WhoEver
editor_uid: 123123123.1231.123
editor_ip: 12.34.56.78
editor_dns: xyz.wtf.com
comment: bla bla bla
mtime: UTC timestamp * 1000000
...

We need some means to edit that meta data. For a normal user, some fields of pageeditor go to the meta file (and everything else is auto-generated). For admin, there could be a normal editor for meta data, too.

The big advantage of this scheme is that it totally gets rid of special-casing attachments. PDF files for example can be part of the wiki without any problems, and you can link to them without jumping through hoops (as they're considered normal wiki pages). The conversion script automatically makes subitems out of a pages' attachments.

For the pagelists (picture lists, file lists, etc.), we can choose to only list items that match (or do not match) some special Content-Type. Or add some link to optionally include other stuff like it is now with system pages.

Bundle ideas

Sketchy ideas on how we can benefit from bundles in other unexpected ways:

User pages

Now when pages can contain meta data - we don't need a user directory any more. We can have all the user account data as sub pages under the user's home page: ThomasWaldmann/AccountData. The format of this file is very similar to the meta files (but the account settings are NOT stored as meta).

A user page will contain the user preferences data and password using mimetype text/X-moin-account-user. This kind of page might get special processing for displaying the user picture and other data, like FOAF data.

We can make the life of new wiki users much simpler by making a homepage for them when they register. The page can show all the special user stuff - without containing any macros or special macros. The user will see a blank page he can write in, everything else will be added by the system, because this page is a user page.

Meta-Infos and ACLs

This is the perfect opportunity to store a facets file or extended meta-info as well as improving handling of the following:

Sub pages can inherit the facets from their parent. I'm not sure it make sense.

Facets: Not in first implementation. This can be done in a 2nd step. We then can also do hierarchical ACLs and inheritance on subpages, processing ACLs as we encounter them going deeper in subpage directory hierarchy. We can also define new ACLs for the namespace, on the root object or on any other object, like whether creation/removing of pages is allowed and for whom.

If we define acl on the root page, we can use this acl setting for all suppages. Then every new page will inherit its parent acl, which make sense in most cases. If a sub page will define acl, we will have a local acl file, and could just override some settings.

Namespace by SubPage

Using sub pages for system pages

Using bundles, we can put all system pages under the SystemPages page, and inside it, a LanguagePages page for all languages of the system pages.

In this way we can separate the system pages from the wiki pages, and separate between different languages of system pages. Accessing specific language pages could be simple string work. If exists a join("SystemPages", "LanguagePages") you can fetch "SpecificPage" from it. If not, you fetch the default "SpecificPage" in English.

We can also override system pages by creating the same page in the root pages directory, without changing the system pages bundle.

This doesn't work due to different page names in different languages. But the general idea of doing separate namespaces (e.g. one for system pages) with this is ok. Kind of another implementation of underlay_dir.

MoinMoin: StorageRefactoring/PagesAsBundles (last edited 2008-02-14 17:52:20 by ThomasWaldmann)