Description

At some places we get UnicodeErrors, usually because:

UnicodeDecodeError: it implicitely tries do use the ASCII decoder to decode a non-ASCII string
UnicodeEncodeError: it tries to encode stuff in an encoding (like ascii) not capable of some unicode chars

Example

Details

MoinMoin Version

1.3

Discussion

We need a global plan how to deal with unicode. Anybody having something more concrete than "decode data entering the program as early as possible into unicode. encode data leaving the program as late as possible to utf8"?

bad unicode calls

All unicode call are valid.
All decode call are valid

Use of u'string %s' % object

Use of u'string %s' % object is dangrous. This example will call __str__ on object, insert in the string, then convert the string to Unicode using the file coding. If the object is using utf-8 and the file coding is iso-8859-1, this code will fail.

If object is unicode, this call will try to encode object as ascii, which will work fine if object contain only ascii characters, but will fail with UnicodeEncodeError when object contain non-ascii characters, like Hebrew for example.

This is most dangerous situation, code that works fine for months during development, and fail 5 minutes after one download the beta and try it with non ascii data.

This issue still has to be checked and then we should search the code to find problem spots.

Here is a test code that check some of the cases: uni.py

Here is test code for using the modulo operator % with unicode values. modulo.py

Here is test code for using the add operator+ with unicode values. add.py

http://www.pycs.net/users/0000323/stories/14.html - has some nice links about unicode.

Plan

Priority: high
Assigned to: NirSoffer
Status: All found unicode errors fixed.

CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/UnicodeErrors (last edited 2007-10-29 19:06:14 by localhost)