Description

Search fails to find any matches if you perform a title search on certain fragments of a page title. (Some fragments get hits as you would expect, but some don't. There doesn't seem to be a very logical pattern...)

The results I list below are (were*) inexplicable to me. In earlier versions (1.5.x for sure, I think 1.6 as well) title searches were simply taken at face value.

Steps to reproduce

Single out a page you want to find and do a title search on some substring of the page title.

Examples

Search for:

Results

m

Finds 2 attachments on page MoinDev/MoinMoinLogo

mo

Only 1 match, MoinMoInDomain (goes directly to page)

moi

none

moin

matches all (3996)

r:moin

3952 matches (odd, this is less than without r:). Seems to exclude only certain attachments (???)

MoinM .. MoinMoi

none

MoinMoin

53 hits, all on attachment names

r:MoinMoin

3657 hits, almost as many as "moin" search

MoinMoinB

none

MoinBugs

none

MoinMoinBugs

works like you would expect

Moin Bugs

Same as MoinMoinBugs, except also includes 6 pages that contain bug (singular) but not the plural

mo in bugs

None!

delete

47 matches

deletes, deleting, deleted

Same as delete. Aha! Xapian is doing stemming...

CutAndPaste

None!

cut and paste

None!

cut paste

None!

1.8_GuiEditorCutAndPasteOfWikiImage

<-- opens that page

r:CutAndPaste

Also finds above page

titlesonly

Matched 3 pages, where "Titlesonly" was Oneword

titles only

Matches 2 (different) pages, with "TitlesOnly" as TwoWords

Component selection

Details

MoinMoin Version

This wiki (1.8 beta/rc, on 2008-09-25)

Server Setup

using Xapian search...

Workaround

Pre-pend the title fragment with r: (regular expression search bypasses Xapian).

Discussion

OK, while researching this I deduced that this site is configured to use Xapian, which mostly explains the behavior I'm seeing. A lot of what Xapian is doing makes perfect sense and is rather beneficial (including the performance gain). Since Xapian use is optional and configurable, this would probably not be a bug of the MoinMoin engine, but perhaps there's still a problem with this site's setup.

However, I thought it was still worth filing this as a bug, because:

  1. not all of what I'm seeing makes sense. Especially...
    1. The "cut and paste" example. Apparently the words "cut" and "paste" are not both indexed??
    2. The "titles only" example. Depending on whether the words or together or CamelCase, you find mutually exclusive sets of matches. This sort of difference should not matter.

    3. If you don't know that some search index/engine is manipulating your searches, the behavior can be rather unintuitive. Since Xapian gets a rather minor mention in the help pages (and most users probably don't know what Xapian is anyway), I think that the fact Xapian is in use should be made obvious to the user. (If Xapian is enabled, add a line at the top of the search results that states this, with a link to Xapian-specific help page.)

    4. How does Xapian decide what is a "word" worth indexing and how they can be combined in a single search term? I assert that this should be explainable in a way that wiki users can grasp how it works and how it affects their searches.
  2. I'd like this to get a sanity check to make sure everything really is behaving as it should be.

In my experience, doing partial title searches can be invaluable, especially when you only remember a small part of the title.


Indexed search works like this:

  1. an indexing run goes through all pages and attachments, extracts words from them and puts all those words into the index.

    • extraction of the words is done by a component called analyzer
    • the analyzer first splits the text at word boundaries
    • additionally it splits up words like CamelCase into Camel and Case (but not elCa)

    • it also splits foo42 into foo and 42

    • it runs the words through the stemmer, yielding the word stems
  2. if you search for something, you will only be able to find stuff that was put into the index by the indexing run

If you still think you found a bug, please remove everything that's not a bug after reading the above and reopen the bug.

Plan


CategoryMoinMoinNoBug

MoinMoin: MoinMoinBugs/NoHitsOnPartialTitleSearch (last edited 2008-09-25 21:24:16 by ThomasWaldmann)