Description
Junk non utf-8 characters are inserted in the user MoinEditorBackup and cause later Unicode error on many operations.
Steps to reproduce
- Create a wiki user.
- Create homepage for user.
- Create a test page.
- Delete test page
- Unicode error during deletion
Unicode error on RecentChanges, etc. for all users.
Example
Details
MoinMoin Version |
Release 1.3.0 [Revision patch-399] |
OS and Version |
Linux (SuSE 9.0) |
Python Version |
2.3 |
Server Setup |
Apache 2.0.48 |
Server Details |
|
Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/MoinMoin/request.py", line 756, in run handler(page.page_name, self) File "/usr/lib/python2.3/site-packages/MoinMoin/wikiaction.py", line 588, in do_savepage comment=comment) File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 864, in saveText backup_url = self._make_backup(newtext, **kw) File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 730, in _make_backup backuppage._write_file(intro + newtext) File "/usr/lib/python2.3/site-packages/MoinMoin/PageEditor.py", line 774, in _write_file was_deprecated = self._get_pragmas(self.get_raw_body()).has_key("deprecated") File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 499, in get_raw_body text = file.read() File "/usr/lib/python2.3/codecs.py", line 380, in read return self.reader.read(size) File "/usr/lib/python2.3/codecs.py", line 253, in read return self.decode(self.stream.read(), self.errors)[0] UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 41: unexpected code byte
Workaround
None
Discussion
The directory containing the page is deleted the errors on RecentChanges etc. disappear, however the user is then unable to edit any further pages without a Unicode error appearing. This is fixed by removing the user MoinEditorBackup directory.
The obvious workaround in this situation is either not to delete pages, or for users not to have a WikiHomePage. I assume this is because if a user doesn't have a homepage the MoinEditorBackup is not created. --OriginalReporter
I can't reproduce this on both this wiki and my test wikis, both running latests code, using my old account and by creating a new account, following the steps above.
This smell like a wiki which was not migrated to utf-8 from iso-8859-1 or another non utf-8 charset. In this case, when you try to read page, you will have Unicode errors.
Give more details about this wiki. Is it new, or upgraded from which version? Did you run all migrations scripts? did you have errors in while running the mig scripts?
-- DeletePageTest 2004-12-10 15:29:48
I've created a new wiki especially for testing this problem. I did have a version 1.1 wiki which I updated to 1.2.4 (no problems) and then to 1.3 this week. The only errors I received in migration were (I think) during mig3 because my original had no cache files. As for the new wiki (url above) I've only just created it so that rules out migration problems (that was my first guess).
I've attached two files, 00000002 is the revision of a deleted page, 00000000 is created as the MoinEditorBackup, should these files not be empty?
- The problem is clear - both files have non utf-8 data in them. You can see this as junk in the attachment view. The question is how this junk got into the page.
I've just installed the latest tarball from the ArchRepository (reports Release 1.3.0 [Revision 1.3.0 release]) and I get the same problem. I've also run with the standalone server to rule out Apache and I get the same error.
Please try this test:
- install a new wiki instance
- copy the data and underlay dir from the distribution
- setup config, permission etc.
- create a new account
- create a page for yourself
- create a another page
- delete the other page
Now make a tarball, or zip from your wiki directory - including your config and data directory and server script. attach this on this page so we can inspect all details and pages.
Also very important: add details about the language you use in your preferences, or in your browser, and which browser you use to edit the pages.
-- NirSoffer 2004-12-10 16:41:01
Done that. However I have also determined that the bug seems related to my Linux installation. It occurs on both of my SuSE 9.0 servers (python 2.3), but not on my SuSE 9.1 workstation (python 2.3.3).
I have english selected as my preferred language in my user preferences, en-us then en in my browser - which is Firefox 0.9.3
-- JonathanBrady 2004-12-10 18:47:52
I check your wiki attachment. Everything is fine expect the last line of rev 0 of TestUser/MoinEditorBackup, which contain few junk characters. I don't have any idea where those characters came from. When they are there, any access to that page rev cause an expected UnicodeError. This is a situation that should never happen, three should be no way to insert data which is not in the wiki charset, but editing the file manually.
After I delete those lines from rev0, there is no problem in this wiki running on current code.
Next step:
The goal is to find the action that insert that junk line. Check your editor backup page after each operation.
Clean your TestUser/MoinEditorBackup of the junk characters or just remove that directory, its just a backup of user last edited page.
- create new page and save - enter the page name in the url box, press enter, select the "create new page", save the page without changing its content.
- If you create the page in a different way before, describe that way and the template you chose.
check if the junk is on TestUser/MoinEditorBackup again
-- NirSoffer 2004-12-10 19:36:46
Deleted TestUser/MoinEditorBackup directory, and created a new page called NewPage without modifying the content. Previously I changed the content to be the same as the page name. TestUser/MoinEditorBackup/revisions/00000001 contains the same as NewPage/revisions/00000001 except for the acl.
-- JonathanBrady 2004-12-10 20:17:38
This is expected and correct. Try to delete that NewPage now, then we see if the junk is created by the delete operation.
With the deletion of NewPage (no reason for deletion specified) TestUser/MoinEditorBackup/revisions/00000000 now contains an acl followed by invalid data. I get the following (I'm now using the standalone server for these tests):
melon.home - - [10/Dec/2004 21:01:04] "GET /NewPage HTTP/1.1" 200 - melon.home - - [10/Dec/2004 21:01:09] "GET /NewPage?action=DeletePage HTTP/1.1" 200 - Traceback (most recent call last): File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 1171, in getPageLinks Page(request, self.page_name).send_page(request, content_only=1) File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 684, in send_page body = self.get_raw_body() File "/usr/lib/python2.3/site-packages/MoinMoin/Page.py", line 499, in get_raw_body text = file.read() File "/usr/lib/python2.3/codecs.py", line 380, in read return self.reader.read(size) File "/usr/lib/python2.3/codecs.py", line 253, in read return self.decode(self.stream.read(), self.errors)[0] UnicodeDecodeError: 'utf8' codec can't decode bytes in position 2-3: invalid data melon.home - - [10/Dec/2004 21:01:14] "POST /NewPage?action=DeletePage HTTP/1.1" 200 -
I do not however see any visual indication in my browser. -- JonathanBrady 2004-12-10 20:38:02
Ok, now repeat the same steps, but before you delete the page, open the file MoinMoin/PageEditor.py, and add the raise... line at line no 704:
intro += _('## backup of page "%(pagename)s" submitted %(date)s') % { 'pagename': pagename, 'date': date,} + '\n' raise (intro + newtext).encode('utf-8') backuppage._write_file(intro + newtext) return backuppage.url(self.request)
Save the file, restart moin.py! Now try to delete again. Here everything is fine in this step. You should get this output: "#acl TestUser:read,write,delete All: deleted: None"
Yes, this time I get a traceback in my browser with "#acl TestUser:read,write,delete All: deleted: None" at the end, MoinEditorBackup is unchanged from the page creation.
-- JonathanBrady 2004-12-10 21:09:20
You can remove that raise, the problem is not there anyway.
It looks like a problem with this specific system, as it works on every other system.
- Try to upgrade python, 2.3.4 is the latest bug fix version of 2.3, you might want to try 2.4.
- Check that your server have the latests updates/fixes and are configure correctly.
OK the error does actually occur around this part of the code. It appears to be in _write_file
If I change it to:
# save to page file pagefile = os.path.join(revdir, revstr) f = codecs.open(pagefile, 'wb', config.charset) # Write the file using text/* mime type f.write(self.encodeTextMimeType(text+'\n')) f.close()
Then the contents of the MoinEditorBackup are no longer corrupted, it seems my version of python has a problem which results in corruption if the file is not terminated with a newline.
-- JonathanBrady 2004-12-10 22:28:50
Very nice Jonathan! but that hack it not the correct fix, I will put here a patch soon. The problem in this case is useless backup being made. In this case, the page body is empty, and we use a default "describe" line. It does not make sense to do a backup of such page content.
Problem fixed in my branch, fix will be available in our tla archive soon. In DeletePage, the page editor is created with do_editor_backup=0, because it does not make sense to make an editor backup of a generated page content contained "deleted". -- NirSoffer 2004-12-11 16:31:46
The fix will be available in moin-1.3.1 soon.
Plan
- Priority:
- Assigned to:
- Status: fixed in patch-424