Description

By default, MoinMoin blocks wget from downloading attachments (and doing anything else other than viewing pages), because it considers wget to be a web spider. Although wget can act as a spider, it's not generally used as one. I think wget should be allowed to at least download attachments. (And perhaps other spiders as well?)

Also, the error message "You are not allowed to access this!" could use some improvement.

Steps to reproduce

  1. wget "http://moinmoin.wikiwikiweb.de/OliverGraf/VimColor?action=AttachFile&do=get&target=VimColor.py"

Details

$ telnet moinmoin.wikiwikiweb.de 80
Trying 83.137.100.43...
Connected to host43.thinkmo.de.
Escape character is '^]'.
GET /OliverGraf/VimColor?action=AttachFile&do=get&target=VimColor.py HTTP/1.0
User-Agent: Wget/1.9
Host: moinmoin.wikiwikiweb.de
Accept: */*
Connection: Keep-Alive

HTTP/1.0 403 Forbidden
Date: Tue, 28 Dec 2004 15:48:41 GMT
Status: 403 FORBIDDEN
Content-type: text/plain
Server: TwistedWeb/1.3.0rc1

You are not allowed to access this!
Connection closed by foreign host.

$ telnet moinmoin.wikiwikiweb.de 80
Trying 83.137.100.43...
Connected to host43.thinkmo.de.
Escape character is '^]'.
GET /OliverGraf/VimColor?action=AttachFile&do=get&target=VimColor.py HTTP/1.0
User-Agent: SomethingElse
Host: moinmoin.wikiwikiweb.de
Accept: */*
Connection: Keep-Alive

HTTP/1.0 200 OK
Date: Tue, 28 Dec 2004 15:55:29 GMT
Content-length: 9111
Content-type: text/plain
Content-disposition: inline; filename="VimColor.py"
Server: TwistedWeb/1.3.0rc1

...

Workaround

You can use the --user-agent="..." option to override the User-Agent header for wget.

Alternatively you can configure your Wiki not to treat wget as a spider, overriding the ua_spiders configuration variable (see below)

Discussion

Is this bug caused by Twisted or MoinMoin?

Is including wget among these defaults justified? I can understand blocking spiders from editing pages, but shouldn't spiders be allowed to download/view attachments?

/usr/lib/python2.3/site-packages/MoinMoin/multiconfig.py line 212:

# a regex of HTTP_USER_AGENTS that should be excluded from logging
# and receive a FORBIDDEN for anything except viewing a page
ua_spiders = ('archiver|crawler|curl|google|htdig|httrack|jeeves|larbin|leech|'
              'linkbot|linkmap|linkwalk|mercator|mirror|nutbot|robot|scooter|'
              'search|sitecheck|spider|wget')

This is not a bug but a feature, and the wiki admin is free to configure this.

We can consider allowing also attachments viewing getting.

-- NirSoffer 2004-12-29 04:48:28

Plan


CategoryMoinMoinBugFixed

  1. standalone always answers with status code 200 (1)

MoinMoin: MoinMoinBugs/SpidersCanOnlyViewPages (last edited 2007-10-29 19:20:56 by localhost)