Redirecting External Links to Defuse Link Spam
As you may have already noticed, many wikis are being vandalized with link spam. Some MoinMoin wikis have even had dozens of pages vandalized at once, probably by a bot script. They do this in the hopes of getting a better Google pagerank, so their spam sites are listed higher in the results of searches.
This MoinMoinPatch will make your wiki redirect external links to Google's new page redirect service. This means the external link does not influence the pagerank of the external site.
So, in a nutshell, spammers can still add links to your wiki pages, but it does nothing to help their site's exposure on Google. If you want to go so far as to try to block spammers from even adding links, see the BlackList page, and you can also block IP addresses in your httpd.conf or .htaccess file if you are using the Apache web server. For example:
<Directory "/usr/share/moin"> Options Indexes Includes FollowSymLinks MultiViews ExecCGI AllowOverride All Order allow,deny #global wiki ban list deny from ip.ad.re.ss1 deny from ip.ad.re.ss2 ... Allow from all </Directory>
Lastly, if your wiki has been severely damaged by spam, please leave a note here if you would like it to be repaired quickly. I have a script that can remove link spam and reverse other forms of damage such as escaped html entities and excess whitespace.
I assume it was someone from here who cleaned up my history.dcs.ed.ac.uk wiki over recent days - you did a great job, no problems with updated pages or pages where I had already manually reverted. Whoever did it should drop me a mail so we can keep in touch (and so I can thank you properly). Meanwhile I wonder if you could run the same script against my other wiki at http://www.gtoal.com/software/ which has also been recently spammed? Thanks - gtoal@gtoal.com (GrahamToal)
Instructions
instead of using these patches, just use url_mappings - described on HelpMiscellaneous (remapping URLs).
Here are the changes to make:
First, add this line to your wiki's moin_config.py file:
1 redirect_links = 1
Now we have to change the source code of MoinMoin. Open this file for editing (the exact path may be different on your system: /usr/lib/python2.3/site-packages/MoinMoin/parser/wiki.py
Search for the def _url_bracket_repl(self, word): method. You need to insert two lines before the last line, so that it looks like this at the end: (make sure you indent correctly)
That takes care of [[bracketed]] external links. We also need to take care of non-bracketed external links like http://wherever
Search for the def _url_repl(self, word): method. The last lines need to look like this:
Optional
To properly complete this patch, you should add a default value for the new redirect_links config property. Edit the main config.py file under /usr/lib/python2.3/site-packages/MoinMoin/ and add this to the _cfg_defaults dictionary:
1 'redirect_links': 0,
Testing
Now when you view a wiki page, external links should redirect through Google. Some pages may have been cached earlier though and will not show redirected links until you refresh the cache for the page.
To delete all page caches, empty these folders under your wiki:
rm /usr/share/moin/mywiki/data/cache/pagelinks/* rm /usr/share/moin/mywiki/data/cache/Page.py/*
Comments
And what about my good links, which should effect the page rank of the good site I link to? We should not change our content because of some spammers. We should find another way to avoid them.
First, we should not allow robots to edit pages - we can identify real users by a cookie. This can be a problem with people that turn their cookies off.
We can think on a editor rank system, when the system will remember the signature of editors that the wiki admins reverted their edits. When an editor gets too much bad points for reverting, he will not be allowed to add external links or edit at all. -- NirSoffer 2004-06-07 00:26:53
I've been working on anti-spam in the email field for a while, with SpamAssassin. Nir's suggested fixes won't help.
Robots can use cookies, no problem -- see perl's LWP for a very well-established robot scraping API that supports cookies.
An editor rank system assumes that the spammer robots will do us the courtesy of using the same editing user account -- if any -- twice in a row, which is very unlikely. Nowadays in email spam, they don't even use the same IP address, due to open proxies!
The alternative would be to not allow new users to create working external links until the admins think they're "OK" -- but that seems even worse. I suggest that a redirect-through-CGI is better than that. -- JustinMason
When I written "signature" I meant user agent/ip/other stuff that can identify the robot or humane spammer when he make edits. Is this realy impossible? -- NirSoffer 2004-06-07 22:20:11
Nir, most referrer-log spamming I've seen uses random user-agent strings, chosen from the most common ones used by humans (e.g. normal-looking MSIE strings and so on). Regarding IP addresses, the open proxy problem means that spammers can buy lists of proxies with thousands of individual IP addresses quite easily. So using user-agent/IP as a key at least wouldn't be useful in my opinion; it would not be long before that was subverted. -- JustinMason 2024-12-27 01:51:55
Personally, I think that something like Advogato's ranking system is going to work. New users enter as newbies and are restricted in what they can do (for example, just add text). If new users sign up and confirm through email, they may edit low-profile pages; if others start ranking them (according to Advogato's model yadayada) they get more privileges like being able to edit high-profile pages (FrontPage), add external links, etcetera. It turns an open community in a half-open community, in my eyes the only way to keep the bad guys out... --- CeesDeGroot
Hi... another suggestion: How about using Javascript to challenge the client to some arbitrary protocol? This would perhaps exclude Lynx/w3m users, but they could be manually authenticated. The Wiki server sends some Javascript with the edit page that computes something - say, the power of two arbitrary numbers or the sum of the ascii values of a specific sentence, or whatever - and puts this in a hidden form field that gets submitted with the edited page. If the value matches, the edit is accepted. The catch would be that the JS code should _not_ be easily (machine-)parseable, otherwise calculating the expected value could be automated as well. -- JensBenecke
JS code is, by definition of being a computer language, easily machine-parseable, so I don't quite see the point. -- JohannesBerg 2005-02-04 18:30:01
Writing a JS engine that is able to cope with such output is not very easy. It is even harder for dumb spammers. But I do not like JavaScript for mandatory functions ... -- AlexanderSchremmer 2005-02-04 19:05:27
Has no-one suggested using a 'captcha' yet? Seems like the obvious way to slow down automated wiki spamming. Of course it also slows down legitimate wiki posting, but at this point I'm willing to live with that :-/ -- GrahamToal
The Link above is a dead-link: HelpMiscellaneous (remapping URLs) -- eMuede
you may want to read HelpOnConfiguration nowadays.