Description
The spellchecker does not catch UnicodeErrors.
Steps to reproduce
- Create a dictionary in MM/dict with non-utf8 encoding
- Delete dict cache in wiki instance
- Run spellcheck
Bombs with stack trace UnicodeDecodeError'utf8 (sorry, i fixed it before i could save the trace)
It should give a better error message like your file isnt in utf8, please see the admin
Example
Details
This wiki.
Workaround
Use iconv words --to-encoding=utf-8 > words-utf8 and delete cache
(just a note to say removing the link then typing iconv /usr/share/dict/words --from-code=ISO_8859-1 --to-code=UTF-8 > words-utf8 did the magic on my Debian Gnu/Linux box Nick Bailey, http://cmt.gla.ac.uk)
Discussion
Is the words file part of the system or part of the distribution? If its part of the system, and its always using iso-8859-1, we can accept this encoding. Generally its easy to accept both utf-8 and iso-8895-1 in the same code, using:
words = file(wordsfile).read() try: words = unicode(words, 'utf-8') except UnicodeError: words = unicode(words, 'iso-8859-1')
Plan
- Priority:
- Assigned to:
- Status: fixed in 1.3.5 by applying debian patch (first tries utf-8, then iso-8859-1, then gives up)