Differences between revisions 13 and 14
Revision 13 as of 2008-05-04 19:19:55
Size: 7443
Comment: fixed for wsgi in 1.7. fastcgi patch needs testing.
Revision 14 as of 2008-05-04 20:07:39
Size: 7414
Comment:
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
 * 1.7 http://hg.moinmo.in/moin/1.7/rev/1052c105b16f send_file patch  * 1.6 send_file patch http://hg.moinmo.in/moin/1.6/rev/ac6e3ce989ad
* 1.7 send_file patch http://hg.moinmo.in/moin/1.7/rev/1052c105b16f
Line 71: Line 72:
Due to the way WSGI support is implemented, the whole downloaded file gets pumped into a StringIO object and then returned by the moin WSGI app to the WSGI server. The good news is that we have a WSGI refactoring project this summer that will cleanup this stuff. -- ThomasWaldmann <<DateTime(2008-05-04T16:16:48+0100)>>
Due to the way WSGI support is implemented, the whole downloaded file gets pumped into a StringIO object and then returned by the moin WSGI app to the WSGI server.
 * 1.6 fix: http://h
g.moinmo.in/moin/1.6/rev/17ff68ef3be7

Description

Our users are using their Moinmoin-wiki also to share files. When they want to download big files (200 - 400 MByte) from the wiki, usually the download doesn't even start or at least runs very slowly (~ 10 kb/s ...) in a 100 MBit LAN. Memory consumption of the Moinmoin-Instance which serves the file grows very quickly to the size of the file, which is being served. Downloading the same file over the same network to the same computer via FTP works perfectly.

I can remember that this worked better some time ago, but as I had trouble with Apache and reconfigured it every few days, I can't remember with which method it worked best (standalone server, mod_wsgi, mod_python or FastCGI ... I tried them all). Note that now I'm not using Apache, but lighttpd instead.

Steps to reproduce

  1. Start downloading a big (~ 420 MByte) attachment from a Wiki page
  2. After some seconds the download starts, running very slowly (10 kb/sec) most of the time
  3. Notice a very big increase in memory consumption on the server:
    • Before:

      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      www-data  9811  0.0  0.0   4664  1672 ?        SN   07:37   0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
      [...]
      www-data  9855  0.0  0.1   8904  6436 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
      www-data  9856  0.0  0.1   8900  6432 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
      www-data  9857  0.0  0.2  19312  9016 ?        SN   07:37   0:01  \_ python /var/www/moin/moin.fcgi
    • While downloading:

      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      www-data  9811  0.0  0.0   4664  1692 ?        SN   07:37   0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
      [...]
      www-data  9855  0.0  0.1   8904  6436 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
      www-data  9856  0.0  0.1   8900  6432 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
      www-data  9857  0.0 27.4 1494732 1131516 ?     SNl  07:37   0:16  \_ python /var/www/moin/moin.fcgi
  4. Notice that after aborting the request, Moinmoin doesn't free all the memory:

    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    www-data  9811  0.0  0.0   4664  1692 ?        SN   07:37   0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf
    [...]
    www-data  9855  0.0  0.1   8904  6436 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
    www-data  9856  0.0  0.1   8900  6432 ?        SN   07:37   0:00  \_ python /var/www/moin/moin.fcgi
    www-data  9857  0.1 11.4 584556 469992 ?       SN   07:37   0:27  \_ python /var/www/moin/moin.fcgi

Example

Unfortunately this is an internal wiki, but if it's really helpful I could create another Wiki instance on the server and upload a big attachment...

Component selection

  • general (attachment handling)

Details

MoinMoin Version

1.6.3

OS and Version

Ubuntu Linux 8.04 32bit, Debian Etch 32bit

Python Version

Python 2.5.2, 2.4.4

Server Hardware

Athlon 64 X2 4600+ (2,4 GHz), 4 GByte RAM, two 1TB Harddisks (running as RAID-1)

Server Setup

Lighttpd 1.4.19 (Ubuntu package), Apache 2.2.3/mod_wsgi 2.0

Server Details

Moinmoin is running as FastCGI, running as WSGI app

Language you are using the wiki in

English

Workaround

  • Best: Use the standalone server and mod_proxy, this configuration does not suffer from the problem
  • Plan A: Provide attachment downloads using FTP
  • Plan B: Use direct attachment serving (which seems to get removed sometime in the future, see RemovingAttachmentsDirectServing)

Discussion

  • On my hosted Wiki (MoinMoin 1.6.3 with Apache/2.0.52 (Red Hat) mod_wsgi/2.0 Python/2.5 configured) I can confirm that a 100MB tar.gz file pushed my memory usage from ~80MB to 220MB shortly after the download started (maybe you like to see the printscreen memory_usage_download.png. Also uploading a 100MB Files generate a plus of 100MB memory usage, but first after the upload was completed. The Download are working with normal speed, so no cancel or failure... well mod's like fastcgi, python and fastcgi likes to cache data - so this is maybe a feature/behavior and not really a MoinMoin bug :-) -- MarcelHäfner 2008-05-04 12:38:05

    (!) We don't discuss upload in this bug report (see title and consider it is a POST being used for upload and some servers might keep all POST data in memory). If you like to analyze upload, please in a different bug report and split discussion there for the different web servers / server methods. -- ThomasWaldmann 2008-05-04 13:00:54

General notes

In AttachFile.do_get (the code responsible for sending file attachments) we use shutil.copyfileobj.

The shutil.copyfileobj stdlib code looks a bit suspicious (doesn't it generate lots of string trash because it uses a new string object for every loop?), but when implementing a small test program that copies large files with it, no large memory consumption is seen - seems like the python interpreter is good reusing strings of same size.

/!\ Small code restructuring, moving send_file code to request object (needed as base for all fixes):

moin 1.6 WSGI

Due to the way WSGI support is implemented, the whole downloaded file gets pumped into a StringIO object and then returned by the moin WSGI app to the WSGI server.

moin 1.6 FastCGI

Similar problem: MoinMoin.support.thfcgi uses StringIO to capture the write data.

(!) I guess it could be fixed by calling fcgreq.flush_out() after each / some blocks. To be able to do this, we would have to copy code similar to copyfileobj to moin's request object and overwrite it with some special method for fastcgi that does the flushing.

moin 1.6 CGI

Directly writes to sys.stdout. No idea whether a problem exists here.

moin 1.6 mod_python

Writes using mpyreq.write(). No idea whether a problem exists here.

moin 1.6 standalone

Writes using sareq.wfile.write(). The problem does not exist here (I verified it), memory consumption does not increase, when I download a 400 MB attachment and download speed is at 100%.

moin 1.6 Twisted

Writes using twistd.write(). No idea whether a problem exists here.

Plan

  • Priority:
  • Assigned to:
  • Status: confirmed that big memory consumption happens, now searching for root cause


CategoryMoinMoinBugConfirmed

MoinMoin: MoinMoinBugs/AttachmentDownloadsSlowAndConsumeTooMuchMemory (last edited 2008-05-24 15:17:10 by ThomasWaldmann)