Summer of Code Proposal 2006
Design Proposal
I propose a project to design and implement an abstraction layer to the MoinMoin storage engine. Its purpose is to allow multiple storage system implementations to be integrated, examples could include the current flat file system or new engines powered using systems such as SVN, DB2, or Mercurial.
This abstraction layer would then also include a second, inward-facing interface which the rest of MoinMoin will connect to - one that is independent of the storage engines chosen implementation.
The overall architecture would be something akin to this. The interfaces are numerically labelled to avoid confusion later:
--------------------- | Moin Moin Wiki | | | --------------------- | (Interface 1) \ / --------------------- | Moin Abstract | | Storage Engine | --------------------- | | | (Interface 2) \ / \ / \ / ------- ------- ------- | Flat| |MySQL| | Mer.| | File| | | | | ------- ------- -------
I am aware that there has been attempts to to rewrite the engine in the past, with the changes monolithic and the branch dating quickly. It is desirable to merge branches often to keep the code up to date and steer development in the same direction. For this to happen iterations must be small with the end state working and passing all tests.
MetaData is currently being stored by specific modules within MoinMoin such as 'users' and 'Page'. These are then invoked from all over the wiki. These items then interface with the flat-file storage engine. Migration to new APIs for both interface 1 and interface 2 are very important tasks.
I will approach this by devising the specifications for interface 1 - with much input and collaboration with the other developers. Then an adapter is created that will map calls coming from the old code (to Page, user etc) to the API of the new interface. That way, extensive changes to code all over MoinMoin need not be a necessity.
At first instance, the storage engine would start off simply routing to the existing flat file storage engine.
The next stage is writing the specification for interface 2. When this is complete, an adapter should be written to translate calls from the API to the pre-existing flat-file system code.
-------------- --------- ------------ --------- ------------ ----------- ----------- | MoinMoin | |Adapter| |Interface1| |Routing| |Interface2| |Adapter | |Original | | Code Not |-->| |-->| |-->|Stub |-->| |-->| |-->|Flatfile | | Refactored | | | | | | | | | | | |BackEnd | -------------- --------- ------------ --------- ------------ ----------- -----------
Essentially at this stage you have the same code, but running through the interface.
The abstract storage engine would provide an interface between the wiki and storage engine implementation as well as the business logic for the storage engine. This would comprise of the generic item class tree that represents all forms of data used by the wiki such as Pages and Users that need storage. It would be accessible to the main code through a proxy/delegator pattern. This way changes can be made to the architecture with less impact to pre-existing code that uses the storage engine.
I am also proposing a loosely coupled interface between the abstract storage engine and the storage implementation to reduce dependencies and help reduce changes to code with refactoring. From a future development perspective, separating them will benefit co-ordination as developers can work on improving individual areas without worrying about impact as they are using specified API.
Draft idea's over some classes and how it would work:
------------------ | AbstractEngine | | Proxy | <---------- Interface 1 ------------------ | C A L (proxy to items to perform higher level tasks. L S | \ / --------------- ------------------ | ItemAbstract| | StorageEngine | <---- Interface 2 | Superclass | | AbstractFactory| --------------- ------------------ | / \ | ----------------- | ------------ | |---------|FlatFileFactory|------- |-------| ACL | ---->Calls>------^ | ----------------- | | | | | | | | ------------ | | ------------------- | | | |---------|SubversionFactory| C | ------------- | | ------------------- R |------| Page |----->Calls>------^ \/ E | | | | A | ------------- C T | R E | ------------ ------------------ E | |------| User | |StorageEngine | A | | | |AbstractAdapter | T | ------------ ------------------ E | | --------------- | ---------- ---------- |------------|Subversion | | |MetaData| | Data | | |StorageAdapter| | ---------- ---------- | ---------------- | | ------------ | | |----|ImageData | | ---------------- | | ------------ |------------|FlatFile | | | ------------ |StorageAdapter|<--- |----|Text Data | ---------------- ------------
The final outcome would be something like the below. The adapter can be removed later once all MoinMoin code has been translated. Any form of caching will be implemented for individual implemented storage engines.
-------------- --------- ------------ ---------- ------------ -------------- | MoinMoin | |Adapter| |Interface1| |Abstract| |Interface2| 1 * |Implemented | | Code Not |-------->| |-------->| |------>|Storage |----->| |------>|Storage | | Refactored | | | | | |Engine | | | |Engine | -------------- --------- ------------ --------- ------------ -------------- / \ ---------------- | | MoinMoin | | | Refactored / |------------------------------ | New Code | ----------------
Other ideas worth considering:
- Storage Engine optionally hooked up to the logging module to provide means to debug the engine.
Timescale
Week |
Date |
Activity |
1 |
(22/05/06) |
Most of this week would be getting familiar with code of the wiki and identifying which modules where talk to the current storage engine and how. |
2 |
(29/05/06) |
Devise specification with developers input for both APIs for the storage engine. |
3 |
(05/06/06) |
Write Interface 1 and wire up the engine to use the existing storage code. Write an adapter to replace the current storage modules (by name) that talk to the new Interface 1. |
4 |
(12/06/06) |
Continue as previous week. |
5 |
(19/06/06) |
Once tests have satisfied there are no problems (unittest?) write specifications for the second interface. Decide if the original needs refactoring to take into account any problems and further feedback. Merge development branch. |
6 |
(26/06/06) |
Mid-programme evaluation Implement interface 2. Create adapter for Interface 2 for existing flat file system code. |
7 |
(03/07/06) |
Run E2E testing using the new abstraction layer, with both adapters in place. Once successful merge branches. |
8 |
(10/07/06) |
Start work on the main implementation, creating Item, MetaData, Data classes. Create Testcases and prove they work. |
9 |
(07/07/06) |
Create abstract factory and implement interface. Run test cases over this section and suite with other code. |
10 |
(24/07/06) |
Start work on flat file system mimicking current flat file set up. Implement test cases. Start E2E testing. |
11 |
(31/07/06) |
Continue working on flat file system. |
12 |
(07/08/06) |
Finish end to end testing with flat file system, remove old module backend, adapter on Interface 2 and merge branches. Can spend this time trying another implementation for a file storage system or refactoring MoinMoin to use the Interface 1 directly instead of the adapter. |
13 |
(14/07/06) |
End of programme Last week is contingency week! |
Why me?
- I have been programming since the age of seven, and have quite a lot of experience behind me. My interest in this particular project stems from a general in elegent design and system interoperability though implementation neutral abstraction layers (see next point).
One of my own projects key features is a external communication interface that handles both incoming and outgoing messages. The job of this interface is to translate the message from an external format such as ebXML, SOAP or REST (and specific apis for example PayPal and the many from Google) to the programs own internal messaging system. This is achieved by having multiple implementations of the interface that handles an outside format. By targeting a particular URL a certain format is used. This is very similar to the internal interfaces that I am proposing for this project; it empowers developers to not worry about writing code for each storage engine implementation and also provides added value to users who can select what storage implementation suits their requirements.
Although primitive compared to the requirements of MoinMoin, PygmyGallery, a python ultra-light gallery I am developing uses a MetaData storage engine to hold the gallery during generation. Although this typically only exists for 10-20 seconds, it has given me some insight into problems with developing MetaData storage engines and how to avoid them. This experience would be useful to this project.
Comments
Please add comments and throw criticism and ideas here. Cheers
Question 1 - JulianRomero
I would mentor this...but Thomas was faster. MoinMoin really need to support several storage backends. A doubt: a mix of storage in the same wiki server will be supported? For example, to have attachments in filesystem and pages in a DB.
Answer 1
Ideally the principle of allowing this would be exceptionally beneficial as it would allow multiple wiki's to read from the same SAN for example. Also different media could be intergrated from a range of difference sources. I am unsure due to my current unfamilarity with rest of the MoinMoin architecture whether or not this could be pulled off easily.
I suppose it can be, but there must also be a mechanism for managing the assignments of filestores. Infact thinking about it, filestores would be a form of Item ideally, and just like users could be accessed by a form of wiki keyword, with an ACL assigned to the filestore itself. However are we prepared to allow users to add additional filestores at will, dynamically in the WikiWay? Or should this be a form of privilege, prehaps held in MetaData for users.
The debate on its implementation itself would require significant input from all parties involved with MM I think, and I left it out of the scope because I was unsure whether there was sufficiant time (taking into account TestDriven development) to adaquatly complete to production level. I'm open to ideas, as I said I personally believe you're right, but I'm skeptical for the development time, prehaps we can iron some ideas relating to the actual implemenation?
Many thanks!