Merging users into shared user_dir
We have a problem when wanting to merge wikis that used to run separately into a wiki farm with a shared/common user_dir (so that only one login is needed for all wikis and all wikis use same user profile for a specific user).
What we have to solve
Merge 1 separate wiki into an existing shared user_dir (that might be empty or have existing user profiles).
Note: if there are multiple separate wikis, we can just repeat the procedure.
What we need to consider
- user names: a change of the user name in the user profile might be required
- e.g. when switching from moin login to ldap login we need to use the ldap usernames
- e.g. when users had (slightly) different names in different wikis
- change user names in on-page ACL lines (automatically)
- change user names in config ACL lines (manually)
- userids in all edit-log entries for page revisions/attachments need to get mapped from present id(s) to shared id
- duplicates - even when only considering 1 wiki, some users will have multiple ids (e.g. forgot password, just created new account)
- we might want to kill some userids (e.g. spammers) - would be nice to remove the userids from the edit-log then
- we might want to disable some userids (e.g. users that are not active any more)
- quicklinks
should work as they have interwiki:pagename (assuming that cfg.interwiki_name was set correctly)
- subscriptions
normal page subscriptions should be ok as they are interwiki:pagename
- TODO: check regex subscriptions
User interface
How do we solve the UI?
- for building the identity sets (multiple userids for same user)
- for changing the name (if wanted)
- for killing/disabling the user profile
Options:
- using text files, csv, manually processing them, creating other text files as input
- web UI (implement as moin action)
Duplicate detection
- same name, same email address
- fuzzy matching?
- manual detection
- if dupes are in same wiki, data from the older profile is not wanted for the new profile
Kill candidates
- userids with no edit-log entries
- userids of edits that got reverted (spam?)
- manual selection
Note: getting rid of the crap users should be done first, just to have a smaller count of users to consider in the following steps. Internet based wikis might have a lot of crap there.