A HoneypotPage is a page that intenionally left open for editing to attract WikiSpam bots, so they can be easily detected and banned. It's meant to be a spam prevention feature that's simple and automagical.
Contents
HonypotPage can be used on:
- Pages that should not be edited very often
Pages that are usually attacked by WikiSpam bot
A good example for such page is FrontPage, but many other pages can be used.
Most admins lock their wiki FrontPage to prevent WikiSpam and WikiVandelism. However, by intentionally leaving the FrontPage open for editing to attract WikiSpam bots to spam it, it's possible to detect WikiSpam as they arrive and then proceed to ban their IPs from editing any other pages for a limited time.
Possibilities
- Can be used for any page that is edited by direct post, e.g not by clicking the edit link on the page. We already use such system for rename and delete page actions.
Can be used for any page that you do not have write access. If you try to use action=edit on a page that you do not have write access, instead of giving the "You are not allowed" error, block that IP for next 30 minutes. We can supply a "Immutable page" visible to users, with a hidden "Edit" link that visible only to bots scanning the html code. Edit from these links will be prevented and the editor will be banned.
Basic Use Case
Here's a basic use case of this feature, using FrontPage as example:
The admin makes the FrontPage to be editable by anyone
When FrontPage is edited by standard user, a warning message is displayed:
Do not edit this page or be banned for 30 minutes and lose all of your changes made to the entire wiki in the past 5 mins
WikiSpam bot arrives on FrontPage and try to add spam links on it
As soon as that happens, WikiSpam's IP address is recorded and banned for 30 mins
The FrontPage is not saved
- The user get a message: "This page is protected, you will not be able to edit any other page in this wiki for the next 30 minutes."
WikiEngine find all pages edited in the last 5 minutes that their last edit was from same IP as the spam bot, and revert the spam bot edits.
Additional Cleanup
Trigger from SecurityPolicy save method:
Make a diff from the unsaved HoneypotPage and the original revision
Scan the 'diff' for URLs e.g. http://somespamsite.com/
- Search the entire wiki for the existence of this URL
- Alerts admin of all pages containing the spam links
- Possibly reverts all spammed pages if they were added recently and from the same IP
This maybe a different feature, useful anytime you revert a spam edit.
Advantages
- Automagical- this mechanism should detect spam as it happens and remove them accordingly, with very little human intervention
Low Maintaince - no huge blacklists (like BadContent) to maintain and download periodically, no need for central authorities for maintainence
- Low Impact - it doesn't scan all pages, every time they're edited. Normal pages are not scanned unless an intrusion has been detected recently. And even then, only recently-edited pages should be scanned
- Users can edit other pages as usual. It should be entirely transparent to the user.
- IPs are only banned temporarily, spam URLs are only recorded for a one-time cleanup and than discarded (rather than blacklisting the URL for all future edits) The impact of false positives should be minimal
- Behaviour-centric (rather than content-centric)
It targets bad behaviours (content editing by non-human bots), rather than BadContent
- There's no cetral authority to decide what's a good link and what's a spam link (the definition of a spam site is entirely up to interpretations, and not black-and-white)
- A content-centric detection method only works when someone has already known and decided what's spam snd what's not, so new, unknown spam links go pass the radar. This mechanism clean up any spam even if it doesn't know that the link is a spam link
Soft Security (See SoftSecurity)
It's not entirely non-voilent, but it's far from authoritarian. it's effects are all temporary and completely reversible. It most certianly follows SoftSecurity philosophy more than most other proposed methods as seen on AntiSpamFeatures.
Problems
A human user may have missed the warning message on the FrontPage and edited the FrontPage in good faith, then loses all their recent edits elsewhere as a result (but really -- how many pages can a real human edit in 5 minutes ?)
- Any edits made recently by other people from the same IP will also be lost. This includes people on the same corporate network or happened to be connecting via the same anonymous proxy as the spambot (this can be a big issue since most spambots do use anonymous proxy servers for obvious reasons).
- Its hard to revert editing if it is not the last edit of the page. If a spam bot added spam and another user add some other text, its hard to remove only the spam.
Current system already supports local bad content, one can ignore the central bad content and use its own LocalBadContent.
- Yes, the current system is *better* than this system -- but I suspect that doing *both* will be even better, each one catching spam that the other lets through.
Due to these reasons, only very recent edits (say, in the last 5 mins) from the same IP should be reverted automatically.
Alternatively, the WikiEngine could be more aggressive, and scan all edits in, say, the last 30 minutes, if it limits reverts to only reverts pages where a spam link was added.
One possibility of providing a safety net is to show the careless user a list of pages that was reverted. A spambot will disregard the list (or try to spam every page on the list, which will be rejected). The careless user in good faith will try to un-revert all the pages, even if they're someone else's (other people connected from the same proxy server) changes. MoinMoin has separate permissions for revert and write, so it's entirely possible for a user banned for editing to revert pages. On WikiEngines with no separate revert rights, the careless user will have to wait no more than 5-10 mins until his ban expires, which is a pretty reasonable wait.
If this method becomes popular, spammers may rewrite their spambots to skip the FrontPage entirely and post spam to other pages only, hence render this useless. However, nothing stops the admin to setup other honeypot pages on their wikis as a workaround.
(should we rename this feature simply HoneypotPage ?)
See Also
A commonly sugguested antispam method is URL blacklisting (see BlackList, BadContent).
Also see AntiSpamFeatures.
Orginal WikiFeatures proposal can be found at here. There may be dicussions there you want to read. - GoofRider
Contributors
Discussions
Does not work for this wiki
This concept does not work for 95% of the spam that goes into this wiki. Simply because the spam is not submitted by bots. The antispam/BadContent system seems to defeat most bot attacks (or they don't attack MoinMoin at all).
Safer way to detect behavior
To make it work, we need a safe way to detect a bot edit. This can be a page that the only way to get it is by "clicking" a hidden link, for example html link like this:
<a href="/SomePage?ban_flag=1" style="display: none important!">HoneyPot</a>
With css rule like this: .hidden {hidden = display: none;}
Even this kind of link might be ignored by some browser, for example a screen reader.
The ban_flag should be dynamic so bot can't detect it simply.
Only if we have a safe way to create such link we can say that we can detect behavior, and then reject edits from bots or scan them.