Description
Search fails to find any matches if you perform a title search on certain fragments of a page title. (Some fragments get hits as you would expect, but some don't. There doesn't seem to be a very logical pattern...)
The results I list below are (were*) inexplicable to me. In earlier versions (1.5.x for sure, I think 1.6 as well) title searches were simply taken at face value.
Steps to reproduce
Single out a page you want to find and do a title search on some substring of the page title.
Examples
Search for: |
Results |
m |
Finds 2 attachments on page MoinDev/MoinMoinLogo |
mo |
Only 1 match, MoinMoInDomain (goes directly to page) |
moi |
none |
moin |
matches all (3996) |
r:moin |
3952 matches (odd, this is less than without r:). Seems to exclude only certain attachments (???) |
MoinM .. MoinMoi |
none |
MoinMoin |
53 hits, all on attachment names |
r:MoinMoin |
3657 hits, almost as many as "moin" search |
MoinMoinB |
none |
MoinBugs |
none |
MoinMoinBugs |
works like you would expect |
Moin Bugs |
Same as MoinMoinBugs, except also includes 6 pages that contain bug (singular) but not the plural |
mo in bugs |
None! |
delete |
47 matches |
deletes, deleting, deleted |
Same as delete. Aha! Xapian is doing stemming... |
CutAndPaste |
None! |
cut and paste |
None! |
cut paste |
None! |
1.8_GuiEditorCutAndPasteOfWikiImage |
<-- opens that page |
r:CutAndPaste |
Also finds above page |
titlesonly |
Matched 3 pages, where "Titlesonly" was Oneword |
titles only |
Matches 2 (different) pages, with "TitlesOnly" as TwoWords |
Component selection
- Xapian / Xapian config
Details
MoinMoin Version |
This wiki (1.8 beta/rc, on 2008-09-25) |
Server Setup |
using Xapian search... |
Workaround
Pre-pend the title fragment with r: (regular expression search bypasses Xapian).
Discussion
OK, while researching this I deduced that this site is configured to use Xapian, which mostly explains the behavior I'm seeing. A lot of what Xapian is doing makes perfect sense and is rather beneficial (including the performance gain). Since Xapian use is optional and configurable, this would probably not be a bug of the MoinMoin engine, but perhaps there's still a problem with this site's setup.
However, I thought it was still worth filing this as a bug, because:
- not all of what I'm seeing makes sense. Especially...
- The "cut and paste" example. Apparently the words "cut" and "paste" are not both indexed??
The "titles only" example. Depending on whether the words or together or CamelCase, you find mutually exclusive sets of matches. This sort of difference should not matter.
If you don't know that some search index/engine is manipulating your searches, the behavior can be rather unintuitive. Since Xapian gets a rather minor mention in the help pages (and most users probably don't know what Xapian is anyway), I think that the fact Xapian is in use should be made obvious to the user. (If Xapian is enabled, add a line at the top of the search results that states this, with a link to Xapian-specific help page.)
- How does Xapian decide what is a "word" worth indexing and how they can be combined in a single search term? I assert that this should be explainable in a way that wiki users can grasp how it works and how it affects their searches.
I'd like this to get a sanity check to make sure everything really is behaving as it should be.
In my experience, doing partial title searches can be invaluable, especially when you only remember a small part of the title.
Indexed search works like this:
an indexing run goes through all pages and attachments, extracts words from them and puts all those words into the index.
- extraction of the words is done by a component called analyzer
- the analyzer first splits the text at word boundaries
additionally it splits up words like CamelCase into Camel and Case (but not elCa)
it also splits foo42 into foo and 42
- it runs the words through the stemmer, yielding the word stems
- if you search for something, you will only be able to find stuff that was put into the index by the indexing run
If you still think you found a bug, please remove everything that's not a bug after reading the above and reopen the bug.
Plan
- Priority:
- Assigned to:
- Status: no bug