Description
After an upgrade to 1.8.2, attaching a file (32KB .odt) the wiki returns an error:
BadZipfile Bad magic number for central directory * args = ('Bad magic number for central directory',)
Attachments for that page can no longer be viewed. Checking the filesystem shows the attachment was uploaded successfully. Deleting the attachment via the filesystem restores the attachment view function.
Steps to reproduce
- Upload an attachment to Moin 1.8.2
Example
Component selection
action: AttachFile.py
Details
HTML traceback page attached: moinbugBadZipfile.tar.gz
Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/MoinMoin/request/__init__.py", line 1311, in run handler(self.page.page_name, self) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 513, in execute msg = handler(pagename, request) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 601, in _do_upload upload_form(pagename, request, msg) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 533, in upload_form send_uploadform(pagename, request) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 493, in send_uploadform request.write(_get_filelist(request, pagename)) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 386, in _get_filelist return _build_filelist(request, pagename, 1, 0) File "/usr/lib/python2.4/site-packages/MoinMoin/action/AttachFile.py", line 337, in _build_filelist is_package = packages.ZipPackage(request, fullpath).isPackage() File "/usr/lib/python2.4/site-packages/MoinMoin/packages.py", line 486, in __init__ self.zipfile = zipfile.ZipFile(filename) File "zipfile.py", line 210, in __init__ self._GetContents() File "zipfile.py", line 230, in _GetContents self._RealGetContents() File "zipfile.py", line 262, in _RealGetContents raise BadZipfile, "Bad magic number for central directory" BadZipfile: Bad magic number for central directory
MoinMoin Version |
1.8.2 |
OS and Version |
Debian 4.0 etch |
Python Version |
2.4.4 |
Server Setup |
debian etch, apache2, libapache2-mod-wsgi, moinmoin 1.8.2 |
Server Details |
web server |
Language you are using the wiki in (set in the browser/UserPreferences) |
English |
Workaround
Discussion
I found this reference to a python zipfile bug that may be related: http://bugs.python.org/issue3997, but I'm not uploading zip files.
Made a test testdocument.odt and could not reproduce this error on this server. maybe your mimetype config on the server says that odt are zip files and that's why moinmoin tries to unpack it (sure the error message is not nice, this "unzip" should be more fault tolerant and display just a nice message)... -- MarcelHäfner 2009-04-22 06:51:11
please can you tell what you get for
>:/etc > grep odt mime.types
it should be
application/vnd.oasis.opendocument.text odt
Likly no bug. Please check if you have a recent MoinMoin/wikiutil.py. You should have in line 841: '.odt': 'application/vnd.oasis.opendocument.text', This means we do already extend the mimetype mapping.
Do you have used the --force parameter for your update installation. Was the old py cache code *.pyc file removed?
- Hi all and thanks for looking at this. The mime.type is as expected:
service:/etc# grep odt mime.types application/vnd.oasis.opendocument.text odt service:/etc#
wikiutil.py shows this:
service:~# grep -n odt /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py 919: '.odt': 'application/vnd.oasis.opendocument.text', service:~# ls -l /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py* -rw-r--r-- 1 root root 90056 2009-02-08 05:25 /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.py -rw-r--r-- 1 root root 80796 2009-04-20 22:17 /usr/lib/python2.4/site-packages/MoinMoin/wikiutil.pyc service:~#
The problem appears to be local to that page, in that on other pages I can upload attachments and they are displayed correctly in the attachment view without the BadZipfile error. Also the SystemAdmin macro attachment viewer has no issue. I've ensured the permissions and ownership of that page and subfolders is the same as any other Page (owned by www-data and rw). To clarify, the attachment is uploaded ok, despite the badzipfile error message. I can also refer to it with an attachment: link and download it from the attachment page. It is just the attachment list view that causes the error on this page.
I'll outline my install/upgrade procedure in case I've missed something:
moin 1.7.2# python -v setup.py install --record=/root/moininstall.log moin 1.7.2# ../createinstance.sh /srv/var/moin/hcwiki config moin config apache backup INSTANCE/data mv INSTANCE INSTANCE.bak moin 1.8.2# for i in `cat /root/moininstall.log` ; do rm $i ; done moin 1.8.2# python -v setup.py install --record=/root/moininstall.log moin 1.8.2# ../createinstance.sh /srv/var/moin/hcwiki restore edit-log event-log data/pages data/users to $INSTANCE # su - www-data www-data$ moin --config-dir=/etc/moin/hcwiki --wiki-url=service.harmsconsulting.com/hcwiki migration data www-data$ moin --config-dir=/etc/moin/hcwiki --wiki-url=service.harmsconsulting.com/hcwiki maint cleancache config moin config apache
Is that sufficient to replace the old installation? I can't find any reference to the --force parameter (presumably to setup.py) - I looked in BasicInstallation and grep force setup.py. Cheers and regards, -- HarmsCon 2009-04-23 00:55:57
Do you have stopped the wiki server process at the time you cleaned the cache?
Do you use custom macros for your attachment list. The standard macros work, see here;
There are 6 attachment(s) stored for this page.
Please check whether you can unpack that odt file triggering the problem with unzip command. Also check whether you can load that file with openoffice. If you succeed with both, please attach the file here so we can reproduce the problem.
Thomas, you are a legend! I never realised odt files were zipped, and trying to unzip it as suggested resulted in the start of central directory not found error message. I repaired the doc and re-uploaded it with a clean result. Thanks to you all for your time, josh. -- HarmsCon 2009-04-27 01:05:52
OK, good that we have found the problem being in the document.
But anyway: moin should not completely fall over such a problem, but catch it somehow. This is why we still need some defective zip file to reproduce the problem. I am leaving the bug open for fixing the exception handler to deal with such stuff nicely.
/me has got this problem too, with PPT (PowerPoint file). Reediting it, saving and uploading fixed the problem, but MoinMoin shouldn't choke on bad attachments: it renders the entire attachment management subpage to be inaccessible and that's bad. The following patch helps in my case (MoinMoin 1.9.5). -- rea[at]codelabs[dot].ru, 17.10.2012
Can you give us such a PPT file please.
Here it is: badfile.ppt It is a sanitized version of our PPT file, since it contains some material I don't really want to share with people, so it was filled with zeros from the beginning. For me, it happens with Python 2.7, whose zipfile.py has the following stanzas
def _EndRecData(fpin): """Return data from the "End of Central Directory" record, or None. The data is a list of the nine items in the ZIP "End of central dir" record followed by a tenth item, the file seek offset of this record.""" # Determine file size fpin.seek(0, 2) filesize = fpin.tell() # Check to see if this is ZIP file with no archive comment (the # "end of central directory" structure should be the last item in the # file if this is the case). try: fpin.seek(-sizeEndCentDir, 2) except IOError: return None data = fpin.read() if data[0:4] == stringEndArchive and data[-2:] == "\000\000": # the signature is correct and there's no comment, unpack structure endrec = struct.unpack(structEndArchive, data) endrec=list(endrec) # Append a blank comment and record start offset endrec.append("") endrec.append(filesize - sizeEndCentDir) # Try to read the "Zip64 end of central directory" structure return _EndRecData64(fpin, -sizeEndCentDir, endrec) # Either this is not a ZIP file, or it is a ZIP file with an archive # comment. Search the end of the file for the "end of central directory" # record signature. The comment is the last item in the ZIP file and may be # up to 64K long. It is assumed that the "end of central directory" magic # number does not appear in the comment. maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0) fpin.seek(maxCommentStart, 0) data = fpin.read() start = data.rfind(stringEndArchive) if start >= 0: # found the magic number; attempt to unpack and interpret recData = data[start:start+sizeEndCentDir] endrec = list(struct.unpack(structEndArchive, recData)) commentSize = endrec[_ECD_COMMENT_SIZE] #as claimed by the zip file comment = data[start+sizeEndCentDir:start+sizeEndCentDir+commentSize] endrec.append(comment) endrec.append(maxCommentStart + start) # Try to read the "Zip64 end of central directory" structure return _EndRecData64(fpin, maxCommentStart + start - filesize, endrec) # Unable to find a valid end of central directory structure return
and what happens here is that the function tries to lookup the string "PK\005\006" in the last 64K bytes of the file in question. It finds that, because there is some ZIP structure within that file, but since it is not really the last record, it won't represent the correct end-of-central-directory record. However, _RealGetContets from the same package tries to interpret it and chokes on the bad signature (should be "PK\001\002"):
def _RealGetContents(self): """Read in the table of contents for the ZIP file.""" fp = self.fp try: endrec = _EndRecData(fp) except IOError: raise BadZipfile("File is not a zip file") if not endrec: raise BadZipfile, "File is not a zip file" if self.debug > 1: print endrec size_cd = endrec[_ECD_SIZE] # bytes in central directory offset_cd = endrec[_ECD_OFFSET] # offset of central directory self.comment = endrec[_ECD_COMMENT] # archive comment # "concat" is zero, unless zip was concatenated to another file concat = endrec[_ECD_LOCATION] - size_cd - offset_cd if endrec[_ECD_SIGNATURE] == stringEndArchive64: # If Zip64 extension structures are present, account for them concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator) if self.debug > 2: inferred = concat + offset_cd print "given, inferred, offset", offset_cd, inferred, concat # self.start_dir: Position of start of central directory self.start_dir = offset_cd + concat fp.seek(self.start_dir, 0) data = fp.read(size_cd) fp = cStringIO.StringIO(data) total = 0 while total < size_cd: centdir = fp.read(sizeCentralDir) if centdir[0:4] != stringCentralDir: raise BadZipfile, "Bad magic number for central directory"
This problem seems to be specific to Python 2.7, since 2.5 and 2.5 give me no problems:
$ python2.5 zf.py badfile.ppt ib-specification-1.2-vol1.zip ib-specification-1.2-vol1.zip is a ZIP file! $ python2.5 zf.py badfile.ppt ib-specification-1.2-vol1.zip ib-specification-1.2-vol1.zip is a ZIP file! $ python2.7 zf.py badfile.ppt ib-specification-1.2-vol1.zip badfile.ppt is a ZIP file! Got exception: Bad magic number for central directory
Script I used for testing.
I had updated my patch to catch two more places where not every Zip-related exception was handled. (patch was applied to moin/1.9 repo, see: http://hg.moinmo.in/moin/1.9/rev/b9450db6c129 )
Please, note that this problem isn't ODT/PPT/WhateverFormat-related. It is just a kind of DoS for the given page where one can upload attachment that contains malformed file that will be errorneously recognized as ZIP file by the Python's zipfile module.
And it lies in the fact that zipfile.{is_zipfile,_check_zipfile} just checks for the result for _EndRecData() without any validation for the returned content, though it is kinda documented.
Plan
- Priority:
- Assigned to:
- Status: exception handler was fixed to catch the corrupt zipfile related exceptions, thanks for the patch!