Description
Some really evil browsers (MoinMoinBugs/InternetExplorer, notably) send the referer as plain iso-8859-1. It seems Moin expect it to be unicode, and this causes a crash when visiting some pages while logged in.
Example
- create an account
- login
- change preferences to have "French" as the default language, or change browser settings to have the wiki in french
- visit any other page
You get this backtrace:
UnicodeDecodeError'utf8' codec can't decode bytes in position 26-28: invalid data Please include this information in your bug reports!: Python Python 2.3.4: /usr/local/bin/python Linux homere.koumbit.net 2.4.27 #2 SMP Sun Aug 8 23:35:47 EDT 2004 i686 MoinMoin Release 1.3rc1 [Revision patch-351] ... /usr/local/lib/python2.3/site-packages/MoinMoin/logfile/eventlog.py in add(self=<EventLog instance>, request=<MoinMoin.request.RequestCGI instance>, eventtype='VIEWPAGE', values={'pagename': u'RechercherUnePage'}, add_http_info=1, mtime_usecs=1101839558149262L) 33 if add_http_info: 34 for key in [u'remote_addr', u'http_user_agent', u'http_referer']: 35 val = unicode(request.__dict__.get(key, ''), config.charset) 36 if val: 37 kvlist.append((key.upper(), val)) # HTTP stuff is UPPERCASE val = u'Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)', unicode undefined, request = <MoinMoin.request.RequestCGI instance>, request.__dict__ = {'_all_pages': None, '_available_actions': None, '_footer_fragments': {}, '_known_actions': None, '_page_headings': {}, '_page_ids': {}, 'accepted_charsets': [], 'args': {}, 'auth_username': None, 'cfg': <wikiconfig.Config instance>, ...}, request.__dict__.get = <built-in method get of dict object>, key = u'http_referer', global config = <module 'MoinMoin.config' from '/usr/local/lib/python2.3/site-packages/MoinMoin/config.pyc'>, config.charset = 'utf-8' UnicodeDecodeError: 'utf8' codec can't decode bytes in position 26-28: invalid data __doc__ = 'Unicode decoding error.' __getitem__ = <bound method UnicodeDecodeError.__getitem__ of <exceptions.UnicodeDecodeError instance>> __init__ = <bound method UnicodeDecodeError.__init__ of <exceptions.UnicodeDecodeError instance>> __module__ = 'exceptions' __str__ = <bound method UnicodeDecodeError.__str__ of <exceptions.UnicodeDecodeError instance>> args = ('utf8', 'http://koumbit.net/wiki/Pr\xe9f\xe9rencesUtilisateur', 26, 29, 'invalid data') encoding = 'utf8' end = 29 object = 'http://koumbit.net/wiki/Pr\xe9f\xe9rencesUtilisateur' reason = 'invalid data' start = 26
Details
MoinMoin Version |
1.3rc1 |
OS and Version |
Debian GNU/Linux 2.4.27 Woody |
Python Version |
Python 2.3.4 |
Server Setup and Version |
Apache 1.3.26-0woody6 |
Oddly enough, I cannot reproduce this here.
- is the appdefaultencoding-hack still active?
Workaround
None known.
Discussion
- Is UTF-8 allowed in the HTTP request? Even if it is not, we should fix our encoding problem here.
It not. http header should use ascii only, with encoded and quoted non ascii characters.
Fixed by recoding referer to ascii, replacing bad characters from broken browsers.
Since we don't used this extreme setup (IE on W98?!), please test the fix on next beta.
I have applied the patch and had the following error:
Traceback (most recent call last): File "/var/alternc/html/r/root/moinmoin/moin.cgi", line 32, in ? request = RequestCGI() File "/usr/local/lib/python2.3/site-packages/MoinMoin/request.py", line 1028, in __init__ self._setup_vars_from_std_env(os.environ) File "/usr/local/lib/python2.3/site-packages/MoinMoin/request.py", line 187, in _setup_vars_from_std_env self.http_referer = unicode(referer, 'ascii', 'replace').encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 26: ordinal not in range(128)
I am now successfully using the following patch:
--- orig/MoinMoin/request.py +++ mod/MoinMoin/request.py @@ -182,7 +182,9 @@ self.server_name = env.get('SERVER_NAME', 'localhost') self.server_port = env.get('SERVER_PORT', '80') self.http_host = env.get('HTTP_HOST','localhost') - self.http_referer = env.get('HTTP_REFERER', '') + # Make sure http referer use only ascii, IE might sent it unquoted! + referer = env.get('HTTP_REFERER', '') + self.http_referer = unicode(referer, 'latin-1', 'replace').encode('latin-1') self.saved_cookie = env.get('HTTP_COOKIE', '') self.script_name = env.get('SCRIPT_NAME', '') path_info = env.get('PATH_INFO', '')
That patch is still broken. I suggest either:
unicode(referer, 'ascii','replace').encode('ascii', 'replace')
or
unicode(referer, 'latin-1','replace').encode('raw_unicode_escape', 'replace')
Unicode replacement character '\ufffd' is now replaced again when encoding to ascii. This should fix this finally, please test.
-- NirSoffer 2004-12-01 18:57:56
Plan
- Priority:
- Assigned to:
- Status: finally fixed in patch-373
I confirm the fix of patch-373 on IE + win98. Thanks! -- TheAnarcat 2004-12-01 19:31:18