Contents
Short description
It would be very helpful for the project wikis in our company to have some kind of broken link management feature in MoinMoin, which reports broken links to missing pages and files on network servers. This feature request concentrates on the problems which can occur in company environments, where not everything can be stored inside the wiki but also on the Web and on project network drives.
Broken links?
Broken links are links which don't work anymore. If you click them, you'll come to an empty page. You always get broken links if the target of the link has moved for any of this reasons:
- a web page doesn't exist anymore or has been moved
- your project leader has restructured the wiki's structure without adjusting the links after that
- a project leader or member of a different group, which has stored their data on a network drive has changed it's structure and you have linked to that in your wiki
- someone has deleted an attachment but not adjusted links to it
- ...
Broken links are annoying and can create a big mess in your wiki. So you should take care about them from time to time and fix them with the help of your wiki system. That's what this feature request is about.
Four types of broken links
There exist four different kinds of links which could be broken. These links are links to...
... missing pages (MoinMoin provides the system page "WantedPages" for that)
- ... missing attachments
- ... missing web links
- ... missing file links to files stored on the local harddisk or a network drive (than we also have to provide methods to connect to several network services like SMB, Novell Directory Services , ... as well)
Tasks of "broken link management" (BLM)
BLM - how we abbreviate "broken link management" from now on - should be able to do this tasks for you ...
- find the four types of broken links
- list the broken links so the user can at least manually fix them
- assist the user in fixing broken links
- provide group renaming feature
- search for moved items
- ...
Environment
The picture above shows a "real life" environment for a company wiki.
Wiki server
Normally the wiki server runs on a dedicated machine. The wiki server runs the wiki framework and has it's own storage where pages and some attachments are located. The pages contain links to every kind of information. Not only to other pages but also to attachments, internet sources and to the projects network drive.
Don't assume that every information should be stored in your wiki. There are other projects which have only weak connections to your project, but maybe you also want to link to them. If the other projects data is stored on a network drive, you have to offer a link to that on your page. The wiki server must not have a direct connection neither to the internet nor to network drives. It only serves the clients - the users - with the wiki pages, which on their own contain links.
With this information we can say about the wiki servers ability to check for broken links on it's own, that ...
... the wiki server can always detect and fix broken links to:
- wiki pages (if user has the proper access rights)
- attachments (if user has the proper access rights)
... the wiki server can't always detect and fix broken links to:
- internet web pages
- files on network drives
cause we can't ensure that the wiki server always has a connection to the internet or a connection or the access rights to all network drives.
User
Users work with the information provided by the wiki server. Users should normally have access rights not only to certain wiki pages but also to the information which is linked from there.
In company environments users normally are working on computers which are separated from the computer, the wiki server runs on.
Users log in into the wiki server through a web browser and can only see pages and follow links to where they have the proper access rights for. They also only can access files on the network drive, if they have the proper access rights. The access rights normally are managed by login information on their computers.
With this information we can say about the users ability to check for broken links on their own, that ...
... the users with the proper access rights can always detect and fix broken links to:
- wiki pages
- attachments
- files stored on network drives
- information located in the internet
... the web browser the users are working with has also access to information, the wiki page provides links to.
So why not to use the user's environment somehow for broken link management?
Workflow(s)
When should BLM be done?
Thus BLM can take quite a time to process every link on every page you have access to in the wiki, it's no task which should be done in intervals of minutes. Maybe it's better to do it once a day. Maybe started automatically by some kind of CRON job software.
Who is allowed to perform BLM?
Thus a successfully performed broken link management requires that users have full access to all information which should be checked, only users which have ...
- read/write access to all pages and attachments which have to be checked
- read access to the internet and network drive where information is linked to from the pages
should perform BLM. This could be some kind of wiki maintainer team or at least the project leader / project admin.
Ways to automatically find broken links
A user which want's to perform BLM must have read access to all pages and all link destinations where the links go to. If the user won't have access, then wrong broken links could be reported.
a trial read access is performed for every link on every page in the wiki from a link checker. If no access is possible, the link will be reported as broken.
links to pages and attachments can be checked by the wiki server on its own
links to web sites and network drives better are checked on the client side, on the user's computer, cause he will have a connection to both the internet and also the network drive. If he wouldn't have that access, he couldn't work with the wiki even if there were no broken links. So we assume that the user, which performs the BLM will have access to all resources he want's fix links to.
- some code is neccessary which runs either outside or inside the users browser but on it's computer. If the link checker runs inside the browser, it could be sended by the wiki server to the user. The user wouldn't have to install special software for that.
a Java applet could do this task. Thus Moin Moin is no Java project, there is no one who could maintain this code. So we should think about using some python -> java solutions like Jython to ...
- have the link checker running in the browser
- let it's code be maintainable by the Moin Moin community
all broken links will be accessible from the wiki server after searching for it. Thus it can't be real time, it has to be stored there somehow (on a page and attachment?). There should be a message about that the information could be outdated until the next run of the link checker.
- list the results in a user friendly way
- list the results in a way that computers can further process this list easily either now or later
Ways to fix broken links
Semi automatic method:
- on a special Wiki page there is a macro which lists this broken links
- the macro also serves a form which has to input fields:
field A is the location of the file before it was moved
field B is the location of the file after it has been moved
- User fills out both fields with the required data (maybe assisted by some kind of file/folder dialog)
- User presses a button which could be named "change links"
after pressing the button some kind of search/replace occurs which changes all links to the file(s) from folder A to folder B and also maybe the file name if requested
- after that again all pages are listed with broken links.
- if the list is empty, everything should be fine
- if the list ist still filled with broken link entries, the search/replace algorithm failed cause of conflicting edit requests to certain pages. Then the change links button has to be pressed again until the list of broken links is empty.
Discussion
Some advice:
- do small steps, maybe break down this FR into multiple pieces:
just adding missing file attachment detection / display it on WantedPages
- try detecting external broken links in a generically doable way, keep in mind:
- not all wiki servers have outbound http (or whatever other protocol) connectivity
- client side code needs installation, not quite the wiki way
- try supporting fixing of broken links, starting with page links and attachment links, keep in mind:
- moin support multiple built-in parsers: moin wiki, creole wiki, rst, docbook + arbitrary plugged parsers
- regex search and replace is likely not enough to cope with that
- try supporting fixing external broken links
- keep in mind that 1.9 is a stable release soon, that means stuff requiring big and widespread changes is not acceptable there
- smaller, rather local changes not touching core stuff are more likely getting in
- pages and attachments are already dead in moin 2.0, superceeded by mimetype items