A proposal for page names handling, planned for 1.3 version. If you want to develop for MoinMoin, you should read this.
See also: MoinMoinBugs/CanCreateBadNames, MoinDev/Storage, QuotingWikiNames
Contents
Different name representations
As of version 1.3, moin uses Unicode internally and a page name can be any Unicode string.
Page names have 3 representations:
- as URL in links in the wiki or anywhere
- as internal Unicode string
- as a filename on any file system
In the future pages might be saved in a database, with or without name limits.
Accepting URLs
URL -> [unquote %] -> Unicode -> [normalize] -> [quote for storage ] -> Storage
Unquote characters using %hex quoting - done by cgi
- Replace "_" with " " - done today by unquoteWikiname, but we must do it in a different place - urls should not be unquoted with the storage unquoter. We should not support internal storage format in url.
Decode from config.charset to Unicode - wikiutil.decodeUserInput
Normalizing page names - see MoinMoinBugs/CanCreateBadNames
- Use Unicode for processing, like matching the list of page names
Quote name for storage if you want to save a page - wikiutil.quoteWikinameFS (using new quoting, patch-78)
Notes:
- Quoted file names will not be unquoted, we don't want to expose the internal quoting to the client.
This will break existing links, but Google is learning fast.
- Check that all code uses same functions for URL handling.
Normalizing page names
Normalizing page name/user names include (in this order):
- Remove silly white space:
Page Name / Sub Page -> Page Name/Sub Page
- Remove multiple slashes
PageName/ / /SubPage -> PageName/SubPage
- Remove leading and trailing "/" from page names, because it's used for sub pages.
/PageName/ -> PageName
- Check length of name:
- Limited by file name quoting, so we have to quote the name for this
- Limited by offline wiki, need to add ".html" to each file name
Limiter by temporary pages, currently adding "#PageName.timestamp#" around the quoted name - I think we should fix this wrong tempfile handling by using os.tempfile or using temp directory.
I think we can restrict pagenames more, maybe use only Unicode alpah numeric characters, but ohter developers don't like the idea. -- NirSoffer 2004-09-06 23:36:59
Normalizing user and groups names
User and group names are used in acl, so they can't have acl reserverd characters: : ,.
- We might want to add more acl characters, like acl entry separator.
Users are planed to become a page, with the user data in UserName/AcountData, so user might not use slashes.
User and groups name will be restricted to Unicode alpha numeric characters, including one optional space chracter between words.
If our user will want less resriction, we can loose in on the next release or bug fix release, or make it a config option.
Normalizing user and group names include (in this order):
- Normalize as page name (see above), becuase groups are pages, and user are usually pages, and might be pages in the future.
- Replace non alpha numeric charactrs with the replacement character: "-"
Page:Name -> Page-Name Page,Name -> Page-Name User/Name -> User-Name
(Code removed, is not updated for the new description)
Generating URLs
From a file name: Storage -> [unquote file name] -> Unicode -> ... From Unicode: Unicode -> [encode to config.charset] -> [quote for URL] -> URL
unquote from storage - wikiutil.unquoteWikiname (using new quoting, patch-78)
- If needed, decode to do any processing on the Unicode name, then encode to config.charset
quote for URL - wikiutil.quoteWikinameURL
Notes:
Create standard URLs according to http://www.ietf.org/rfc/rfc2396.txt - so one can copy a URL from the browser location box, and paste it anywhere.
- Or use readable URL using config.charset, and add a link on the page that one can copy and paste.
- The URL uses config.charset and encoded characters like this %d7%90 (This is Hebrew Alef in utf8)
- This works with both utf8 and iso-8859-1 - should check other encodings
Use "/" for sub pages - this is already done in wikiutil.quoteWikinameURL
- Fix clients of unquoteWikiname that call it with Unicode name instead of quoted filename.
- Check that all code uses one URL generating function
Offline Wiki
Using moin_dump, we can save the wiki as a collection of html pages. In this version, links should use file names.
There are two options:
- Dump the wiki using standard quoted file names. The links will use the ugly quoted names, but you can dump the wiki to any file system. This is the current solution.
- Dump the wiki to Unicode file names (utf16/utf8) - if the file system supports this, with only special characters quoted. In this case, the links can use nice readable links or standard URL quoted strings.
Notes:
- Update moin_dump to work with new quoting
- Seems that it should work with new quoting with no change.
- Validate name length, we should have place to append ".html" - this is an issue in Hebrew and even worse in Asian languages.
If we do subpages by subdirectories (see StorageRefactoring/PagesAsBundles) we should also convert pages and subpages html output to a directory structure like this:
ParentPage index.html SubPage.html
Now both http://domin/ParentPage/ and http://domain/ParentPage/SubPage.html will work, and we don't have any problem with long wiki names like VeryLongChineeseParentPage/VeryLongChineeseSubPage, which could easily reach 254 characters using utf-8 encoding.
Obtaining page lists
Page class now contains getPageList and getPageDict, returning a list of page names or a dict of {pagename: Page object} of user-readable pages (either of user given, or request.user, or all pages for user=""). /MoinEditorBackup pages are filtered out.