Description

The spellchecker does not catch UnicodeErrors.

Steps to reproduce

  1. Create a dictionary in MM/dict with non-utf8 encoding
  2. Delete dict cache in wiki instance
  3. Run spellcheck
  4. Bombs with stack trace UnicodeDecodeError'utf8 (sorry, i fixed it before i could save the trace)

It should give a better error message like your file isnt in utf8, please see the admin

Example

Details

This wiki.

Workaround

Use iconv words --to-encoding=utf-8 > words-utf8 and delete cache

(just a note to say removing the link then typing iconv /usr/share/dict/words --from-code=ISO_8859-1 --to-code=UTF-8 > words-utf8 did the magic on my Debian Gnu/Linux box Nick Bailey, http://cmt.gla.ac.uk)

Discussion

Is the words file part of the system or part of the distribution? If its part of the system, and its always using iso-8859-1, we can accept this encoding. Generally its easy to accept both utf-8 and iso-8895-1 in the same code, using:

words = file(wordsfile).read()
try:
    words = unicode(words, 'utf-8')
except UnicodeError:
    words = unicode(words, 'iso-8859-1')

Plan


CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/SpellCheckUnicodeError (last edited 2007-10-29 19:20:23 by localhost)