Description
If the xapian search is enabled, the Macro FullSearchCached / FullSearch acting different (comparing to the MoinMoin Search) for "category searches". In MoinMoin you'll find a syntax like:
<<FullSearchCached(category:MoinMoinBugs/1.6.3FullSearchWithXapianForCategory)>> or <<FullSearchCached(category:CategoryLinux)>> or <<FullSearchCached(cat:CategoryLinux)>>
http://master16.moinmo.in/CategoryTemplate?action=raw
With Xapian enabled only a search without the Category Name is working. Don't be sure; maybe it's because it's only searching for in fulltext mode!
<<FullSearchCached(cat:Linux)>>
An other side effect (of not supporting category:CategoryLinux) is that the Form/Wikipage FindPage is not working for Category Searching.
Url with categories=CategoryApache:
FindPage?action=fullsearch&advancedsearch=1&and_terms=&or_terms=¬_terms=&mtime=&categories=CategoryApache&language=&mimetype=
Steps to reproduce
Use Xapian and see that Category Search with FullSearch and FindPage is not possible!
Example
FullSearch
- Category Search
- Attachment
Logfile if you call this page above:
2008-05-06 15:59:50,754 INFO request.find_remote_addr: addrs == ['83.78.138.141', '127.0.0.1'] 2008-05-06 15:59:50,807 INFO xapianSearch: query = 'Xapian::Query(XCAT:linux)' 2008-05-06 15:59:51,241 INFO xapianSearch: query = 'Xapian::Query(XCAT:linux)' 2008-05-06 15:59:51,674 INFO xapianSearch: query = 'Xapian::Query(XCAT:categorylinux)' 2008-05-06 15:59:51,678 INFO xapianSearch: query = 'Xapian::Query(XCAT:categorylinux)'
FindPage
- FindPage
- Attachment
Logfile if you try to search for CategoryApache with the FindPage:
2008-05-06 16:01:09,767 INFO request.find_remote_addr: addrs == ['83.78.138.141', '127.0.0.1'] 2008-05-06 16:01:09,774 INFO xapianSearch: query = 'Xapian::Query(XCAT:categoryapache)'
Component selection
- Xapian
Details
MoinMoin Version |
1.6.3 included changeset 2642 fc9439999597 hg.moinmo.in |
OS and Version |
RedHat Linux |
Python Version |
2.5.2 |
Server Setup |
Apache WSGI |
Server Details |
|
Language you are using the wiki in (set in the browser/UserPreferences) |
|
Workaround
Use <<FullSearchCached(cat:Linux)>> or <<FullSearchCached(linkto:CategoryLinux)>>
- quick fix:
diff -r 3376df1919e3 MoinMoin/search/Xapian.py --- a/MoinMoin/search/Xapian.py Tue May 06 22:23:54 2008 +0200 +++ b/MoinMoin/search/Xapian.py Tue May 06 23:52:55 2008 +0200 @@ -436,7 +436,8 @@ class Index(BaseIndex): return [] return [cat.lower() - for cat in re.findall(r'Category([^\s]+)', body[pos:])] + for cat in re.findall(r'Category[^\s]+', body[pos:])] # XXX needs i18n / configurability + # we have page_category_regex there, but it doesn't match the complete category tag def _get_domains(self, page): """ Returns a generator with all the domains the page belongs to
I installed the latest "changeset 2643 3376df1919e3" and after that your patch. First also needed to rebuilding the Index (without it wasn't working). The xapian search now only accepting the correct syntax "cat:CategoryLinux". The old syntax "cat:Linux" is not working anymore. (Test for FullSearch or Results of FindPage).
- It works for English e.g. category names.
Another Problem is, that if you use some german CategoryName like "Kategorie" it seems not to work (maybe like to test under http://rock.heavy.ch/KategorieNews
config file:
page_category_regex = u'^Kategorie[A-Z]'
-- MarcelHäfner 2008-05-07 07:12:22
Needs more work:
currently hardcoded to search for Category..., no good for i18n
- we have cfg.page_category_regex, but the default does not match the full term
- same is true for some other regexes there
- we have cfg.page_category_regex, but the default does not match the full term
Discussion
So I'm on 1.8.5 and after switching to Xapian, if a page linked to a category with brackets, like [[CategoryABCStuff]], the !CategoryABCStuff page doesn't list it. I am not clear from the above discussion what is the proper workaround for 1.8.5? Has this problem really remained unfixed since 1.6.3? Should this page be renamed to reflect that it's still a bug in 1.8.5? -- JohnGoerzen 2009-10-16 21:38:17
Might this not be a problem with the category regular expression in the configuration? This tends to persist through upgrades, even when new installations might use a different set of regular expressions for certain syntax features. -- PaulBoddie 2009-10-26 19:26:57
I'm using the default regex on a fresh install. -- JohnGoerzen 2009-10-26 20:30:01
I think I've seen the sort of behaviour you're talking about. When the category page doesn't form a link, the inclination is to make it one: CategoryABCStuff becomes [[CategoryABCStuff]]; then you might think that the page is now in the category (which it might be, I'm not too sure). But when Moin parses pages and extracts the category details for indexing, it won't recognise these page names in link syntax as indications of category membership (search for page_category_regex in MoinMoin.config.multiconfig). I had a similar problem with a category page called CategoryFAQ which actually appears as [[CategoryFAQ]] in the Add to: menu when editing a page - Moin seems to be trying to help - but I remain skeptical that actually adding an explicit link really does what we want.
I think the category infrastructure needs some tidying to avoid these contradictory mechanisms. -- PaulBoddie 2009-10-26 22:37:23
Plan
- Priority:
- Assigned to:
- Status: quick fix applied, but needs more work for non-default category regex / non-english
1.6 quick fix for english: http://hg.moinmo.in/moin/1.6/rev/b5cab2999450
1.7 quick fix for english: http://hg.moinmo.in/moin/1.7/rev/7593b6b4590c
1.7 fix: http://hg.moinmo.in/moin/1.7/rev/124d0ef138aa
- please help testing all page_*_regex based stuff (dicts, groups, categories, templates)
- 1.6: please opinions whether that change should be backported. It fixes non-english categories for xapian, but does require different configuration (and would break wikis of careless admins not doing the required config changes).
1.9 fixed by refactoring of Xapian2009