Description
Our users are using their Moinmoin-wiki also to share files. When they want to download big files (200 - 400 MByte) from the wiki, usually the download doesn't even start or at least runs very slowly (~ 10 kb/s ...) in a 100 MBit LAN. Memory consumption of the Moinmoin-Instance which serves the file grows very quickly to the size of the file, which is being served. Downloading the same file over the same network to the same computer via FTP works perfectly.
I can remember that this worked better some time ago, but as I had trouble with Apache and reconfigured it every few days, I can't remember with which method it worked best (standalone server, mod_wsgi, mod_python or FastCGI ... I tried them all). Note that now I'm not using Apache, but lighttpd instead.
Steps to reproduce
- Start downloading a big (~ 420 MByte) attachment from a Wiki page
- After some seconds the download starts, running very slowly (10 kb/sec) most of the time
- Notice a very big increase in memory consumption on the server:
Before:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 9811 0.0 0.0 4664 1672 ? SN 07:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf [...] www-data 9855 0.0 0.1 8904 6436 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9856 0.0 0.1 8900 6432 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9857 0.0 0.2 19312 9016 ? SN 07:37 0:01 \_ python /var/www/moin/moin.fcgi
While downloading:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 9811 0.0 0.0 4664 1692 ? SN 07:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf [...] www-data 9855 0.0 0.1 8904 6436 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9856 0.0 0.1 8900 6432 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9857 0.0 27.4 1494732 1131516 ? SNl 07:37 0:16 \_ python /var/www/moin/moin.fcgi
Notice that after aborting the request, Moinmoin doesn't free all the memory:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 9811 0.0 0.0 4664 1692 ? SN 07:37 0:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf [...] www-data 9855 0.0 0.1 8904 6436 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9856 0.0 0.1 8900 6432 ? SN 07:37 0:00 \_ python /var/www/moin/moin.fcgi www-data 9857 0.1 11.4 584556 469992 ? SN 07:37 0:27 \_ python /var/www/moin/moin.fcgi
Example
Unfortunately this is an internal wiki, but if it's really helpful I could create another Wiki instance on the server and upload a big attachment...
Component selection
- general (attachment handling)
Details
MoinMoin Version |
1.6.3 |
OS and Version |
Ubuntu Linux 8.04 32bit, Debian Etch 32bit |
Python Version |
Python 2.5.2, 2.4.4 |
Server Hardware |
Athlon 64 X2 4600+ (2,4 GHz), 4 GByte RAM, two 1TB Harddisks (running as RAID-1) |
Server Setup |
Lighttpd 1.4.19 (Ubuntu package), Apache 2.2.3/mod_wsgi 2.0 |
Server Details |
Moinmoin is running as FastCGI, running as WSGI app |
Language you are using the wiki in |
English |
Workaround
- Best: Use the standalone server and mod_proxy, this configuration does not suffer from the problem
- Plan A: Provide attachment downloads using FTP
Plan B: Use direct attachment serving (which seems to get removed sometime in the future, see RemovingAttachmentsDirectServing)
Discussion
On my hosted Wiki (MoinMoin 1.6.3 with Apache/2.0.52 (Red Hat) mod_wsgi/2.0 Python/2.5 configured) I can confirm that a 100MB tar.gz file pushed my memory usage from ~80MB to 220MB shortly after the download started (maybe you like to see the printscreen memory_usage_download.png. Also uploading a 100MB Files generate a plus of 100MB memory usage, but first after the upload was completed. The Download are working with normal speed, so no cancel or failure... well mod's like fastcgi, python and fastcgi likes to cache data - so this is maybe a feature/behavior and not really a MoinMoin bug -- MarcelHäfner 2008-05-04 12:38:05
We don't discuss upload in this bug report (see title and consider it is a POST being used for upload and some servers might keep all POST data in memory). If you like to analyze upload, please in a different bug report and split discussion there for the different web servers / server methods. -- ThomasWaldmann 2008-05-04 13:00:54
General notes
In AttachFile.do_get (the code responsible for sending file attachments) we use shutil.copyfileobj.
The shutil.copyfileobj stdlib code looks a bit suspicious (doesn't it generate lots of string trash because it uses a new string object for every loop?), but when implementing a small test program that copies large files with it, no large memory consumption is seen - seems like the python interpreter is good reusing strings of same size.
Small code restructuring, moving send_file code to request object (needed as base for all fixes):
1.6 send_file patch http://hg.moinmo.in/moin/1.6/rev/ac6e3ce989ad
1.7 send_file patch http://hg.moinmo.in/moin/1.7/rev/1052c105b16f
moin 1.6 WSGI
Due to the way WSGI support is implemented, the whole downloaded file gets pumped into a StringIO object and then returned by the moin WSGI app to the WSGI server.
(See maybe also the report under fastcgi/Lighttpd; next chapter). Also on Linux RedHat with Apache2 (prefork) with mod_wsgi 2, I can confirm that a big download (600MB) "only" let raise the http prefork processes from 1m to 72m memory and around 2% CPU. Guess this is maybe not an MoinMoin problem, more how the mod_wsgi daemon and apache process working together. A download directly "served" from apache2 doesn't consume the same amount of memory (at least on my latest test). Wonder if there are some apache/wsgi tweaks.... -- MarcelHäfner 2008-05-22 07:53:55
26588 lotek 22 0 0 0:01.63 0.3 141m 11m 3620 S httpd <-- wsgi 1 26589 lotek 23 0 0 0:01.00 0.3 141m 11m 3588 S httpd <-- wsgi 2 23787 lotek 16 0 0 0:00.39 0.0 8096 1880 1428 S sshd 26616 lotek 16 0 0 0:00.26 0.0 2648 1176 780 R top 26593 lotek 15 0 0 0:00.21 1.8 83584 72m 1376 S httpd <-- http process (normaly under 1m)
I think you can't compare static file serving with wsgi stuff. But I guess you have to ask the mod_wsgi developer for details about that. I think by using the offered wsgi filewrapper (or using some simple own one if none is offered) we have done everything with moin that can be done by a wsgi application. -- ThomasWaldmann 2008-05-24 15:17:09
moin 1.6 FastCGI
Similar problem: MoinMoin.support.thfcgi uses StringIO to capture the write data.
I guess it could be fixed by calling fcgreq.flush_out() after each / some blocks. To be able to do this, we would have to copy code similar to copyfileobj to moin's request object and overwrite it with some special method for fastcgi that does the flushing.
Testing using the fix on Moin 1.6.3 / lighttpd, memory usage while downloading:
www-data 13060 10.9 12.9 535128 532232 ? S 07:57 0:07 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf www-data 13061 0.4 0.1 8904 6480 ? S 07:57 0:00 \_ python /var/www/moin/moin.fcgi www-data 13074 8.7 0.3 24748 14428 ? S 07:57 0:05 \_ python /var/www/moin/moin.fcgi
Looks like now MoinMoin doesn't consume the memory, but Lighttpd does. Probably because it buffers the complete output of MoinMoin, in order to free the backend as fast as possible. Please also notice the 100% CPU of the lighttpd process - this suddenly drops to 0% after some time, maybe after MoinMoin is finished with sending the attachment data to Lighttpd. Download speed is 100% now. Unfortunately, Lighttpd doesn't free all the memory after the download is finished, it keeps increasing with every request. (I suspect here are some Lighttpd bugs to blame)
In any case, that's much better than before in my opinion - but I guess I'll still resort to the mod_proxy solution (but I'll test anything you need )
moin 1.6 standalone
Writes using sareq.wfile.write().
The problem does not exist here (I verified it), memory consumption does not increase, when I download a 400 MB attachment and download speed is at 100%.
moin 1.6 CGI
Directly writes to sys.stdout. No idea whether a problem exists here.
moin 1.6 mod_python
Writes using mpyreq.write(). No idea whether a problem exists here.
moin 1.6 Twisted
Writes using twistd.write(). No idea whether a problem exists here.
Plan
- Priority:
- Assigned to:
- Status:
- fixed for wsgi and fastcgi
- no fix needed for standalone
- I guess it also needs no fix for cgi/mod_python/Twisted and close this bug. If someone thinks more fixes are needed, please reopen the bug and provide more data.