Description

Some really evil browsers (MoinMoinBugs/InternetExplorer, notably) send the referer as plain iso-8859-1. It seems Moin expect it to be unicode, and this causes a crash when visiting some pages while logged in.

Example

URL: http://koumbit.net/wiki/

  1. create an account
  2. login
  3. change preferences to have "French" as the default language, or change browser settings to have the wiki in french
  4. visit any other page

You get this backtrace:

UnicodeDecodeError'utf8' codec can't decode bytes in position 26-28: invalid data Please include this information in your bug reports!:
Python Python 2.3.4: /usr/local/bin/python
Linux homere.koumbit.net 2.4.27 #2 SMP Sun Aug 8 23:35:47 EDT 2004 i686
MoinMoin Release 1.3rc1 [Revision patch-351]
...
 /usr/local/lib/python2.3/site-packages/MoinMoin/logfile/eventlog.py in add(self=<EventLog instance>, request=<MoinMoin.request.RequestCGI instance>, eventtype='VIEWPAGE', values={'pagename': u'RechercherUnePage'}, add_http_info=1, mtime_usecs=1101839558149262L) 
   33         if add_http_info:
   34             for key in [u'remote_addr', u'http_user_agent', u'http_referer']:
   35                 val = unicode(request.__dict__.get(key, ''), config.charset)
   36                 if val:
   37                     kvlist.append((key.upper(), val)) # HTTP stuff is UPPERCASE
 
val = u'Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)', unicode undefined, request = <MoinMoin.request.RequestCGI instance>, request.__dict__ = {'_all_pages': None, '_available_actions': None, '_footer_fragments': {}, '_known_actions': None, '_page_headings': {}, '_page_ids': {}, 'accepted_charsets': [], 'args': {}, 'auth_username': None, 'cfg': <wikiconfig.Config instance>, ...}, request.__dict__.get = <built-in method get of dict object>, key = u'http_referer', global config = <module 'MoinMoin.config' from '/usr/local/lib/python2.3/site-packages/MoinMoin/config.pyc'>, config.charset = 'utf-8' 

UnicodeDecodeError: 'utf8' codec can't decode bytes in position 26-28: invalid data 
      __doc__ = 'Unicode decoding error.' 
      __getitem__ = <bound method UnicodeDecodeError.__getitem__ of <exceptions.UnicodeDecodeError instance>> 
      __init__ = <bound method UnicodeDecodeError.__init__ of <exceptions.UnicodeDecodeError instance>> 
      __module__ = 'exceptions' 
      __str__ = <bound method UnicodeDecodeError.__str__ of <exceptions.UnicodeDecodeError instance>> 
      args = ('utf8', 'http://koumbit.net/wiki/Pr\xe9f\xe9rencesUtilisateur', 26, 29, 'invalid data') 
      encoding = 'utf8' 
      end = 29 
      object = 'http://koumbit.net/wiki/Pr\xe9f\xe9rencesUtilisateur' 
      reason = 'invalid data' 
      start = 26 

Details

MoinMoin Version

1.3rc1

OS and Version

Debian GNU/Linux 2.4.27 Woody

Python Version

Python 2.3.4

Server Setup and Version

Apache 1.3.26-0woody6

Oddly enough, I cannot reproduce this here. :(

Workaround

None known.

Discussion

It not. http header should use ascii only, with encoded and quoted non ascii characters.

Fixed by recoding referer to ascii, replacing bad characters from broken browsers.

Since we don't used this extreme setup (IE on W98?!), please test the fix on next beta.

I have applied the patch and had the following error:

Traceback (most recent call last):
  File "/var/alternc/html/r/root/moinmoin/moin.cgi", line 32, in ?
    request = RequestCGI()
  File "/usr/local/lib/python2.3/site-packages/MoinMoin/request.py", line 1028, in __init__
    self._setup_vars_from_std_env(os.environ)
  File "/usr/local/lib/python2.3/site-packages/MoinMoin/request.py", line 187, in _setup_vars_from_std_env
    self.http_referer = unicode(referer, 'ascii', 'replace').encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 26: ordinal not in range(128)

I am now successfully using the following patch:

--- orig/MoinMoin/request.py
+++ mod/MoinMoin/request.py
@@ -182,7 +182,9 @@
         self.server_name = env.get('SERVER_NAME', 'localhost')
         self.server_port = env.get('SERVER_PORT', '80')
         self.http_host = env.get('HTTP_HOST','localhost')
-        self.http_referer = env.get('HTTP_REFERER', '')
+        # Make sure http referer use only ascii, IE might sent it unquoted!
+        referer = env.get('HTTP_REFERER', '')
+        self.http_referer = unicode(referer, 'latin-1', 'replace').encode('latin-1')
         self.saved_cookie = env.get('HTTP_COOKIE', '')
         self.script_name = env.get('SCRIPT_NAME', '')
         path_info = env.get('PATH_INFO', '')

That patch is still broken. I suggest either:

unicode(referer, 'ascii','replace').encode('ascii', 'replace')

or

unicode(referer, 'latin-1','replace').encode('raw_unicode_escape', 'replace')

Unicode replacement character '\ufffd' is now replaced again when encoding to ascii. This should fix this finally, please test.

-- NirSoffer 2004-12-01 18:57:56

Plan

I confirm the fix of patch-373 on IE + win98. Thanks! -- TheAnarcat 2004-12-01 19:31:18


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/NonUnicodeReferer (last edited 2007-10-29 19:12:14 by localhost)