Short description
After a few thousand pages, the file-system bases storage backend becomes slow. For example, having thousands of files in one directory makes it hard to even explore the directory on Windows.
For larger Wikis, it might be a good idea to use a SQL database instead (Mysql, Oracle XE or whatever).
Notes:
there was a Google SoC 2008 backend rewrite project, see http://hg.moinmo.in/moin/1.8-storage/ - we just need someone writing a SQL backend for that
- for a short term solution:
- there are faster and better filesystems than on windows
- if you must do it on windows, of course you do not use FAT32
- maybe don't conclude from explorer performance on filesystem performance - I guess NTFS is much better optimized than explorer
- Filesystems are basically databases optimized for lookup of content (files) via (file) name. This is exactly what a wiki does most of the time: you have a (page) name and you want to see its content.
Creating a SQL query, transmitting it to the database process, interpreting it there, sending the results back takes time. So, for simple queries, like name -> content lookup, I think you'll get the response from the FS quicker than from the SQL DB.
- Of course a SQL DB is much better if you like to do complex queries, because a filesystem just does not directly support that.
- So using the file system is not that far fetched as some people who are rather used to database storage maybe think.
Agreed. Yet, since MoinMoin stores all revisions of all pages in one directory, the total size of the Wiki including its history is bounded by the disk partition size, whereas a database such like Oracle can easily and transparently use several hard disks and manage some Terabytes of data.
It is not clear to me why do you need a database such like Oracle to use and transparently use several hard disks and manage some Terabytes of data? -- ReimarBauer 2008-08-29 12:45:33
Full text search on MoinMoin is unbearably slow once there are some thousands of pages; this really limits MoinMoin's usefulness. I just tried a Full Text search on moinmo.in which took 18 long seconds to search 10,000 pages, and my wiki's content (for example) is much larger. (In practice this is especially an issue for macros that have to implicitly do fulltext searches.) Whether or not the pages are stored as files, the option of having the title, text, and metadata in a database (whether additionally or instead) that's configured with full text indexing would dramatically improve scalability. In fact, if this were to supplement rather than replace a file-based system, the database could hold only current page versions, reducing its size dramatically as archives would still be stored in files. But whether a supplement or a replacement, lack of a database search is the biggest thing standing in the way of MoinMoin's scalability -- David Bell 2008-10-22 18:46:00
Hmm, if I search using xapian search it is quite fast less than 3 seconds for a word searched on 5000 pages of http://moinmo.in. The amount of needed time can differ dependent on the idle time of the server where the wiki is installed. However there was a storage refactoring Summer of Code 2008 project, result is at http://hg.moinmo.in/moin/1.8-storage/ - see the MoinMoin.storage package. We currently don't plan to write a SQL backend ourselves (lots of other stuff to do), but if someone else wants to, we would help, of course. -- ReimarBauer 2008-10-23 06:15:26
Another reason for this : due to security concerns, my provider does not allow a program running on its servers to create or modify files. The only way to store data is to use the MySQL server they provide, which currently prevents me from transferring my current wiki to them.
A sqla backend is implemented already in moin-2.0 development, see MoinMoinTodo/Release 2.0/short. -- ReimarBauer 2009-09-02 23:03:45
If you have no fs access at all, the current sqla backend is not enough. Moin still accesses the filesystem for caching and misc. other stuff, thus more work is needed in that area (same is true for GAE compatibility). -- ThomasWaldmann 2009-09-03 06:56:58
multiple storage methods, like at Auth?
Would it be workable to include multiple storage methods, like it is already possible to configure different auth-methods? Thus, Filesystem storage could be converted in the background to (SQL/noSQL/ORDBMS) - or, static pages could stay in the Filesystem while dynamic, often-revised pages could be stored in eg. a PostgreSQL ORDBMS or a MongoDB noSQL.
That way, a good deal of flexibility would be created IMO. --~~~~
MoinMoin2.0 has modular storage code and a storage API.