Accessing pages with lots of attachments is too expensive
Our wiki (wiki.ubuntuusers.de) has a page with lots of file attachments (5000+ files).
Apparently a crawler or whatever found it. That killed the wiki.
Steps to reproduce
- Create a big Wiki.
Access /Wiki/Cache?action=AttachFile&do=view&target=_some_file_in_the_cache.png
- Watch the Wiki twiddle its thumbs for a *long* time.
Details
Each access to a page with attachments creates rel=... links for all the attachments. If there are 5000+ attachments, this is deadly.
(Version: 1.5.1.)
That memory problem still exists in 1.5.5a - it exists when one uploads files. It prevents uploading large files on servers with low memory. This should be moved to a separate bug report + more details.
The parts of this bug report about "slow download speeds" and "huge memory consumption on download" issues, including steps to reproduce it and all platform details, were moved to ../AttachmentDownloadsSlowAndConsumeTooMuchMemory.
Workaround
For now, I have patched moin thus:
diff -rub MoinMoin/action/AttachFile.py /tmp/MoinMoin/action/AttachFile.py --- MoinMoin/action/AttachFile.py 2006-01-21 17:48:04.000000000 +0100 +++ /tmp/MoinMoin/action/AttachFile.py 2006-02-09 13:31:23.000000000 +0100 @@ -211,6 +211,10 @@ str = "" if files: + if len(files) > 100: + if showheader: + return '%s<p>%s</p>' % (str, _("Too many attachments stored for %(pagename)s") % {'pagename': pagename}) + if showheader: str = str + _( "To refer to attachments on a page, use '''{{{attachment:filename}}}''', \n" @@ -322,6 +326,7 @@ ############################################################################# def send_link_rel(request, pagename): files = _get_files(request, pagename) + if len(files) > 100: return if len(files) > 0 and not htdocs_access(request): scriptName = request.getScriptname()
Discussion
This is mainly a DOS problem. I admit that the <link> tags are not really sensible. Does anybody object to remove them at all?
The AttachList macro does read (the beginning of) every file (the call to iszipfile).
- It should do some reads on every file if you are in the attachment view to find out if a file is a package file. But bots are not allowed to see that view. So you cannot be DOSed by that.
Plan
- Priority:
- Assigned to:
- Status: