Description

The SyncPages action fails on pages with names containing non-ascii characters. (Unicode page content transfers normally.) This behaviour is encountered in Moin 1.9.3 with Python 2.7.

Steps to reproduce

  1. Create a page with a name containing non-ascii characters (e.g., СудалгааныТанилцуулга)

  2. Create a sync job page with a pageMatch or pageList field matching that page (e.g., pageList:: СудалгааныТанилцуулга)

  3. Attempt to run the sync (using action=SyncPages)

Example

See traceback output below.

Component selection

Details

2012-01-11 12:11:44,458 INFO MoinMoin.web.serving:41 127.0.0.1 "GET /SyncTest?action=SyncPages HTTP/1.1" 200 -
2012-01-11 12:11:56,230 ERROR MoinMoin.wsgiapp:293 An exception has occurred [http://localhost:8080/SyncTest?action=SyncPages].
Traceback (most recent call last):
  File "/home/edt/moinmoin/MoinMoin/wsgiapp.py", line 282, in __call__
    response = run(context)
  File "/home/edt/moinmoin/MoinMoin/wsgiapp.py", line 88, in run
    response = dispatch(request, context, action_name)
  File "/home/edt/moinmoin/MoinMoin/wsgiapp.py", line 136, in dispatch
    response = handle_action(context, pagename, action_name)
  File "/home/edt/moinmoin/MoinMoin/wsgiapp.py", line 195, in handle_action
    handler(context.page.page_name, context)
  File "/home/edt/moinmoin/MoinMoin/action/SyncPages.py", line 511, in execute
    ActionClass(pagename, request).render()
  File "/home/edt/moinmoin/MoinMoin/action/SyncPages.py", line 220, in render
    self.sync(params, local, remote)
  File "/home/edt/moinmoin/MoinMoin/action/SyncPages.py", line 279, in sync
    r_pages = remote.get_pages(exclude_non_writable=direction != DOWN)
  File "/home/edt/moinmoin/MoinMoin/wikisync.py", line 286, in get_pages
    tokres, pages = m()
  File "/usr/lib/python2.7/xmlrpclib.py", line 997, in __call__
    return MultiCallIterator(self.__server.system.multicall(marshalled_list))
  File "/usr/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1575, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1292, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1439, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 809, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 550: ordinal not in range(128)

MoinMoin Version

1.9.3

OS and Version

Ubuntu 11.10

Python Version

2.7

Server Setup

wikiserver.py

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

English

Workaround

Use ascii-only pagenames.

Discussion

If one debugs this with pydev please read PyDev, Python and system default Unicode encoding problem

Something like this could solve it.

diff -r 56eaf32027f4 wikiserver.py
--- a/wikiserver.py     Tue Feb 07 21:48:50 2012 +0100
+++ b/wikiserver.py     Fri Feb 17 22:30:22 2012 +0100
@@ -7,6 +7,8 @@
 """
 
 import sys, os
+reload(sys)
+sys.setdefaultencoding("utf-8")
 
 # a) Configuration of Python's code search path
 #    If you already have set up the PYTHONPATH environment variable for the

http://stackoverflow.com/questions/3828723/why-we-need-sys-setdefaultencodingutf-8-in-py-scipt

or at

diff -r 56eaf32027f4 wikiconfig.py
--- a/wikiconfig.py     Tue Feb 07 21:48:50 2012 +0100
+++ b/wikiconfig.py     Fri Feb 17 22:40:50 2012 +0100
@@ -8,7 +8,7 @@
 
 import sys, os
 
-from MoinMoin.config import multiconfig, url_prefix_static
+from MoinMoin.config import multiconfig, url_prefix_static, charset
 
 
 class LocalConfig(multiconfig.DefaultConfig):
@@ -43,7 +43,8 @@
 
     # Add your configuration items here.
     secrets = 'This string is NOT a secret, please make up your own, long, random secret string!'
-
+    reload(sys)
+    sys.setdefaultencoding(charset)
 # DEVELOPERS! Do not add your configuration items there,
 # you could accidentally commit them! Instead, create a
 # wikiconfig_local.py file containing this:

/!\ Changing the default encoding is evil. We should rather fix wrong data types (str vs. unicode) so it does not need to do implicit encoding/decoding. -- ThomasWaldmann 2012-02-18 15:18:27

diff -r 1ddf7d88c53d MoinMoin/wikisync.py
--- a/MoinMoin/wikisync.py      Thu Mar 01 00:15:41 2012 +0100
+++ b/MoinMoin/wikisync.py      Sun Mar 04 21:00:10 2012 +0100
@@ -166,7 +166,7 @@
         _ = self.request.getText
 
         wikitag, wikiurl, wikitail, wikitag_bad = wikiutil.resolve_interwiki(self.request, interwikiname, '')
-        self.wiki_url = wikiutil.mapURL(self.request, wikiurl)
+        self.wiki_url = wikiutil.mapURL(self.request, wikiurl.encode("utf-8"))
         self.valid = not wikitag_bad
         self.xmlrpc_url = self.wiki_url + "?action=xmlrpc2"
         if not self.valid:

(!) We can encode the url too

in 2.7 httplb does

        if isinstance(message_body, str):
            msg += message_body
            message_body = None
        self.send(msg)

it assumes that if message_body is instance of str that also msg is from the same type. This is not in our current code. Without the change the url is unicode.

In syncpages we have often hardcoded utf-8 for decoding and not config.charset. Also on lots of other places we encode urlparts.

Please paste your full wikisync config page. -- AlexanderSchremmer

Proposed solution (please try it, didn't test it):

diff -r 1ddf7d88c53d MoinMoin/wikisync.py
--- a/MoinMoin/wikisync.py      Thu Mar 01 00:15:41 2012 +0100
+++ b/MoinMoin/wikisync.py      Sun Mar 04 21:00:10 2012 +0100
@@ -166,7 +166,7 @@
         _ = self.request.getText
 
         wikitag, wikiurl, wikitail, wikitag_bad = wikiutil.resolve_interwiki(self.request, interwikiname, '')
         self.wiki_url = wikiutil.mapURL(self.request, wikiurl)
         self.valid = not wikitag_bad
-        self.xmlrpc_url = self.wiki_url + "?action=xmlrpc2"
+        self.xmlrpc_url = str(self.wiki_url + "?action=xmlrpc2")  # url MUST be str. unicode would lead to msg_body decoding issues in py 2.7 httplib.
         if not self.valid:

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/XmlrpcSyncFailsOnUnicodePagenames (last edited 2012-09-17 06:57:40 by ReimarBauer)