Description

If a page name starts with an accented character, those pages appear in the end of the list after a search (i.e., a category search), rather than where they would belong if the sort order were alphabetical.

The same applies for page names in lower case letters. Althoough I agree that that's not the rule, strange (by the rules) names do appear now and then in my wiki.

Steps to reproduce

Run a wiki in an environment where accented characters are common (Germany, Switzerland, …)
Create a page with a name starting in an accented character and ad it to some category
List the pages in that category

Example

Component selection

general

Details

MoinMoin Version	1.8.1
OS and Version	Windows XP SP2
Python Version	2.5.4
Server Setup	Apache 2.2 / mod_wsgi
Language you are using the wiki in (set in the browser/UserPreferences)	German (de)

Workaround

I've modified the file MoinMoin/search/results.py to use the system's default locale. Diff to the original:

--- search/results.orig.py      2009-01-06 23:22:35.218750000 +0100
+++ search/results.py   2009-01-06 23:24:13.796875000 +0100
@@ -15,6 +15,10 @@
 from MoinMoin import wikiutil
 from MoinMoin.Page import Page
 
+import locale
+locale.setlocale(locale.LC_ALL, "")
+localized_cmp=lambda p1, p2: locale.strcoll(p1[0], p2[0])
+
 ############################################################################
 ### Results
 ############################################################################
@@ -257,7 +261,7 @@
     def _sortByPagename(self):
         """ Sorts a list of found pages alphabetical by page name """
         tmp = [(hit.page_name, hit) for hit in self.hits]
-        tmp.sort()
+        tmp.sort(cmp=localized_cmp)
         self.hits = [item[1] for item in tmp]
 
     def stats(self, request, formatter, hitsFrom):

$/!\$ You need to run diff -u orig new.

done, thanks

$/!\$ How is the locale of the system used for the server related to the content in the wiki?

That question is a good one… As I'm in a very confined intranet environment (and because it's the "Workaround" section ), it was of no concern for me: the server locale is ok for all clients.

Suggestions:

Language setting of the browser, if available (I realize that the sort function currently has no access to the request). I'm not sure, though, whether the client language should have an influence on the sort order of the wiki content: I'd expect a sort to correspond to the main language of the site I'm browsing.
A locale be based on the language_default setting in the wiki configuration
A configurable sort order / sort locale (i.e. a new config setting)
At the least, I'd make the sort case insensitive.

The above fix breaks Xapian searching for attachments of MIME type application/octet-stream, which relies on string.letters being strictly ASCII. Another modification fixes this:

--- filter/application_octet_stream.orig.py     2008-08-31 22:00:52.000000000 +0200
+++ filter/application_octet_stream.py  2009-02-16 22:29:58.127190800 +0100
@@ -36,7 +36,7 @@
 norm = string.maketrans('', '')
 
 # builds a list of all non-alphanumeric characters:
-non_alnum = string.translate(norm, norm, string.letters+string.digits)
+non_alnum = string.translate(norm, norm, string.ascii_letters+string.digits)
 
 # translate table that replaces all non-alphanumeric by blanks:
 trans_nontext = string.maketrans(non_alnum, ' '*len(non_alnum))

$/!\$ You need to run diff -u orig new.

done, thanks

Discussion

Plan

Priority:
Assigned to:
Status:

CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/SearchResultSortedByAsciiNotAlphabet (last edited 2009-02-17 16:55:58 by securemail3)