Description

I installed the rev/a728d059c78e (9. November 2009) MoinMoin 1.9 together with xapian 1.07 (1.0.6 is require, see changelog). The Problem is that finding some page title with stemmed words is not all the times possible. it seems that stemming sometimes works but more often not.

Steps to reproduce

for an easier reproducing I made all test with this running / current MoinMoin version

Example

see above...

Component selection

Details

MoinMoin Version

Version 1.9.0rc1 [Revision release] | this wiki, too

OS and Version

ubuntu linux server 9.04 | and also this wiki

Python Version

2.6.2

Server Setup

Apache mod_wsgi

Server Details

Xapian 1.0.7

Language you are using the wiki in (set in the browser/UserPreferences)

de and en

Workaround

Discussion

Locking

While I'm testing I realized that even the not stemmed form couldn't be found!

So it seems not only a stemmer problem but also maybe something wrong with the "tokenizer". but anyway also some "default" help pages could not be found (see /Test... -- MarcelHäfner 2009-11-10 15:52:33

Multilingual

Multilingual stemming applies morphological rules of two or more languages simultaneously instead of rules for only a single language when interpreting a search query. Commercial systems using multilingual stemming exist.

There is also maybe a main problem if you want have a multilingual wiki (e.g. with pages in fr, de and en). there are exist the concept of multilingual stemmers (see some docs about here), but it seem that the default xapian & snowball stemming is not Multilingual. So the problem comes here,

In my eyes you have only to possibilities:

  1. setup a wiki and config what is the main and only stemming language (so index and query-parser use the same stemming language algorithm all-the-time (like xapian_stemming_language = 'DE' )

  2. or using the stemming algorithm that is capable of multilingual, so the indexer stems correctly words from different language and so the query-parser makes the same stuff.
    To develop / doing this by a moinmoin developer could be a big task, so it should be needed that xapian / snowball supporting this or some addition library can be integrated to use optional this kind of technology.

Some Links:

Test/Review results

xapian.Stem test

I have shortly tested xapian.Stem() and it seems to behave mostly correct for en and de. So the problem we have is likely not the stemmer itself, but how we use it.

Code review

Current Example or Different Problem?

Was about to open a bug report but found this one. A title search for "sortable" on this wiki results in 3 hits. A search for "sort" yields 30 hits, but the 3 hits for "sortable" are NOT included.

Plan


CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/1.9XapianStemmingNotWorkingCorrectly (last edited 2010-02-25 20:33:11 by RogerHaase)