As you may have already noticed, many wikis are being vandalized with link spam. Some MoinMoin wikis have even had dozens of pages vandalized at once, probably by a bot script. They do this in the hopes of getting a better Google pagerank, so their spam sites are listed higher in the results of searches.

This MoinMoinPatch will make your wiki redirect external links to Google's new page redirect service. This means the external link does not influence the pagerank of the external site.

So, in a nutshell, spammers can still add links to your wiki pages, but it does nothing to help their site's exposure on Google. If you want to go so far as to try to block spammers from even adding links, see the BlackList page, and you can also block IP addresses in your httpd.conf or .htaccess file if you are using the Apache web server. For example:

<Directory "/usr/share/moin">
  Options Indexes Includes FollowSymLinks MultiViews ExecCGI
  AllowOverride All
  Order allow,deny

  #global wiki ban list
  deny from ip.ad.re.ss1
  deny from ip.ad.re.ss2
  ...
  
  Allow from all
</Directory>

Lastly, if your wiki has been severely damaged by spam, please leave a note here if you would like it to be repaired quickly. I have a script that can remove link spam and reverse other forms of damage such as escaped html entities and excess whitespace.

Instructions

/!\ instead of using these patches, just use url_mappings - described on HelpMiscellaneous (remapping URLs).

Here are the changes to make:

First, add this line to your wiki's moin_config.py file:

   1 redirect_links = 1

Now we have to change the source code of MoinMoin. Open this file for editing (the exact path may be different on your system: /usr/lib/python2.3/site-packages/MoinMoin/parser/wiki.py

Search for the def _url_bracket_repl(self, word): method. You need to insert two lines before the last line, so that it looks like this at the end: (make sure you indent correctly)

   1         if config.redirect_links:
   2             words[0] = 'http://www.google.com/url?sa=D&q=' + words[0]
   3         return self.formatter.url(words[0], text, 'external', pretty_url=1, unescaped=1)

That takes care of [[bracketed]] external links. We also need to take care of non-bracketed external links like http://wherever

Search for the def _url_repl(self, word): method. The last lines need to look like this:

   1         if config.redirect_links:
   2             redirect = 'http://www.google.com/url?sa=D&q=' + word
   3         else:
   4             redirect = word
   5         return self.formatter.url(redirect, text=self.highlight_text(word))

Optional

To properly complete this patch, you should add a default value for the new redirect_links config property. Edit the main config.py file under /usr/lib/python2.3/site-packages/MoinMoin/ and add this to the _cfg_defaults dictionary:

   1     'redirect_links': 0,

Testing

Now when you view a wiki page, external links should redirect through Google. Some pages may have been cached earlier though and will not show redirected links until you refresh the cache for the page.

To delete all page caches, empty these folders under your wiki:

rm /usr/share/moin/mywiki/data/cache/pagelinks/*
rm /usr/share/moin/mywiki/data/cache/Page.py/*

Comments

/Comments

And what about my good links, which should effect the page rank of the good site I link to? We should not change our content because of some spammers. We should find another way to avoid them.

First, we should not allow robots to edit pages - we can identify real users by a cookie. This can be a problem with people that turn their cookies off.

We can think on a editor rank system, when the system will remember the signature of editors that the wiki admins reverted their edits. When an editor gets too much bad points for reverting, he will not be allowed to add external links or edit at all. -- NirSoffer 2004-06-07 00:26:53

I've been working on anti-spam in the email field for a while, with SpamAssassin. Nir's suggested fixes won't help.

Robots can use cookies, no problem -- see perl's LWP for a very well-established robot scraping API that supports cookies.

An editor rank system assumes that the spammer robots will do us the courtesy of using the same editing user account -- if any -- twice in a row, which is very unlikely. Nowadays in email spam, they don't even use the same IP address, due to open proxies!

The alternative would be to not allow new users to create working external links until the admins think they're "OK" -- but that seems even worse. I suggest that a redirect-through-CGI is better than that. -- JustinMason

When I written "signature" I meant user agent/ip/other stuff that can identify the robot or humane spammer when he make edits. Is this realy impossible? -- NirSoffer 2004-06-07 22:20:11

Nir, most referrer-log spamming I've seen uses random user-agent strings, chosen from the most common ones used by humans (e.g. normal-looking MSIE strings and so on). Regarding IP addresses, the open proxy problem means that spammers can buy lists of proxies with thousands of individual IP addresses quite easily. So using user-agent/IP as a key at least wouldn't be useful in my opinion; it would not be long before that was subverted. -- JustinMason 2024-03-29 15:09:03

Personally, I think that something like Advogato's ranking system is going to work. New users enter as newbies and are restricted in what they can do (for example, just add text). If new users sign up and confirm through email, they may edit low-profile pages; if others start ranking them (according to Advogato's model yadayada) they get more privileges like being able to edit high-profile pages (FrontPage), add external links, etcetera. It turns an open community in a half-open community, in my eyes the only way to keep the bad guys out... --- CeesDeGroot

Hi... another suggestion: How about using Javascript to challenge the client to some arbitrary protocol? This would perhaps exclude Lynx/w3m users, but they could be manually authenticated. The Wiki server sends some Javascript with the edit page that computes something - say, the power of two arbitrary numbers or the sum of the ascii values of a specific sentence, or whatever - and puts this in a hidden form field that gets submitted with the edited page. If the value matches, the edit is accepted. The catch would be that the JS code should _not_ be easily (machine-)parseable, otherwise calculating the expected value could be automated as well. -- JensBenecke

Has no-one suggested using a 'captcha' yet? Seems like the obvious way to slow down automated wiki spamming. Of course it also slows down legitimate wiki posting, but at this point I'm willing to live with that :-/ -- GrahamToal

The Link above is a dead-link: HelpMiscellaneous (remapping URLs) -- eMuede

you may want to read HelpOnConfiguration nowadays.

MoinMoin: RedirectingExternalLinks (last edited 2011-09-21 14:12:18 by ReimarBauer)