Description
From https://bugs.launchpad.net/ubuntu/+source/moin/+bug/217191
moin creates all it's pages as a directory in the the data/pages directory. It also flattens any hierarchy in the process, e.g. Testing/Cases/UMEdesktop-config-date becomes:
drwxrwx--- 4 www-data www-data 4096 2008-04-10 00:05 Testing(2f)Cases(2f)UMEdesktop(2d)config(2d)date
This is compounded by pre 1.6 which also creates ${User}(2d)MoinEditorBackup 'pages' (i.e. another directory) for anyone who edits a page.
The above in combination with ext2/3's '32K-directories in one directory' limit caused http://wiki.ubuntu.com/ to break recently.
Even if you discount this as an 'ext2/3 FS limit', it's not exactly a great plan in terms of scalability.
Steps to reproduce
- Create 31,998 pages
- Try to create a new page
Example
Component selection
- general
Details
MoinMoin Version |
all moin versions at least up to 1.7.x |
OS and Version |
Linux with ext3 filesystem |
Python Version |
|
Server Setup |
|
Server Details |
|
Language you are using the wiki in (set in the browser/UserPreferences) |
|
Discussion
About the problem:
- this is a design problem in the moin storage code, not a bug (a bug is when a software does not do what it is designed to do). moin was not designed to work around a filesystem's directory entries limitation.
- that 'flat' design has advantages in other areas, e.g. it is much easier to list pages matching some regex or search for a string in a page name.
- in fact it is a limitation of ext3. if there were some other usable filesystem with a higher limit, you would have that higher limit. moin does not limit the amount of pages. I can't advise on other filesystems as I use ext3 myself.
- having long directories is no problem for modern filesystems, even ext3 can use an index for directories.
About the solution:
we know the problem and we work on it with high priority, but this is lots of work and there is no quick fix possible. we can't tell exactly when the new storage stuff will be production ready, but I expect end 2008 .. mid 2009. If you like, look at the "moin-storage" branch in our repo at http://hg.moinmo.in/ and follow what's happening while GoogleSoc2008.
- we even planned to introduce hierarchical page storage some time ago, but we delayed this to some time after we have a sane storage API.
Workaround
Of course knowing the above is not very helpful in case you run into this problem, so lets add some hints about how to delay the problem:
remove all */MoinEditorBackup pages if you run moin < 1.6 (if you migrate to >= 1.6 this will be done automatically)
use moin maint cleanpages to remove trash and deleted pages (please review the script it outputs before running it). be careful, if your users often revert page deletions (if you remove the pagedirs of deleted pages, they won't be able to). your options in that case are:
- run it, but keep the folder "deleted" just for the case and restore pagedirs on user request from there. you could even set up another wiki with those pages so users can do it themselves.
- grep the script that moin maint cleanpage outputs and extract the lines where it removes the trash pages (trash pages are pagedirs with no content) and only remove the trash (those trash pagedirs are often created by bots, spammers, users doing strange things, bugs in old moin versions).
move your wiki user homepages to a separate wiki (see user_homewiki = 'OtherWiki' in recent moin releases)
clean up, delete pages not needed any more
(and run maint cleanpages)
- use another filesystem (be careful: reliability is even more important than impressively high limits)
you may try out the xfs filesystem, should allow "millions" of files/directories (because inodes are allocated dynamically) and should be a bit faster (B+ tree structure). -- MarcelHäfner 2008-04-15 14:42:58
Do you have XFS in production server use yourself? With LVM/Kernel-RAID? We had very bad experiences with this combination about 2y ago. -- ThomasWaldmann 2008-04-15 18:44:09
no! just a thought to workaround the 32k limits of ext3, but maybe ext4 will be in the future also a solution ("an additional scalability improvement is eliminating the 32,000 subdirectory limit in ext4", but only to 64k... see here) -- MarcelHäfner 2008-04-16 05:55:50
Plan
- Priority:
- Assigned to:
- Status: we are working on a storage backend api and backend plugins to make this more flexible (there was a successful SOC2007 project with all the base work and there will likely be a SOC 2008 project that adds some features and makes it production ready)