## Template for submitting bug reports
## Note: All bugs pagename should be: MoinMoinBugs/BugName
## Note: Don't forget you can improve this bug template - edit the page called BugTemplate.


= Description =

Bugs in the new search engine:

 * Search for "Nir" find interve'''nir'''e, defi'''nir'''e, ve'''nir'''e etc.
It make sense that "term" will find only "term" like "two words" find only "two words"

 * Search for "nir une" find "Pour obte'''nir une''' liste de toutes les"
It make sense that "term" find only whole words.

IIRC the old search did exaclty the same. -- FlorianFesti <<DateTime(2004-10-02T13:50:14Z)>>

== Details ==
## if the bug is in this wiki, just kill the table and write: This Wiki.

## Fill in the relevant details - os, browser, python version on the server etc.
## Uncomment and fill only the relevant lines

|| '''!MoinMoin Version''' || 1.3--patch-152 ||
## || '''OS and Version''' || example: Gentoo Linux kernel 2.4.24  ||
## || '''Python Version''' || example:  Python 2.3.3 ||
## || '''Server Setup and Version''' || example: Twisted 1.1.1 or Apache 2.0.48 with mod_python 3.1.2b ||
## || '''Server Details''' || example: using SSL and Authentication ||


== Workaround ==
## How to deal with the bug until it is fixed


= Discussion =

HelpOnSearching say:
 "Double or single quotes may be used to include white space into search terms"

This this might view as a feature. But common sense say that when you search for "somthing" or "this and that", you are looking for whole words. As this is the default - I think it should use this behavior.

If one want to look for part of a word, he can use r:something or r:this\sand\sthat.

Google search work like this:
 "Phrase Searches

 Search for complete phrases by enclosing them in quotation marks. Words enclosed in double quotes ("like this") will appear together in all results exactly as you have entered them. Phrase searches are especially useful when searching for famous sayings or proper names."

It does not say anything on whole words, I wonder how they treat this issue.

In Hebrew it does make sense NOT to look for whole words, as the same word in Hebrew can appear with extra letters. For example, "to a wiki" is written as "lewiki" - the word "wiki" get a "le" prefix. "to the wiki" is "lawiki" and "the wiki" is "hawiki" (Hebrew is like perl - short and hard to read). So When you look for "Wiki" you want to find the same word with the "le" or "la" or other prefixes or sufixes.

But even in Hebrew - one does not expect to get somthing which is not inside the "search term"

It seems that it does not make sence in English though. Does it make sence to make the search language aware? How you define a the language of a search? I think the best would be one default for all languages. If we have behavior that works bad for some language, we can make this a configuration option, as each wiki has one language usually, and the wiki admin can choose what works better.

The question is what is the common case that most people expect, and make this the default. We also must add much more examples for common search tasks to HelpOnSearching.

Related links:
 * http://www.dlib.org/dlib/january97/retrieval/01shneiderman.html - rather old article from 1997, pre-Google.

"Real" search index use indices. They tweak the word before enter them into the index. If you add "\b" to the begin and the end of the word you will no longer find things you hant to have - like words with plural s or gramatical endings with are more common than in english in most other european languages. In german it characterazation of word is done by appending other words. `battle ship` is `Schlachtschiff` in German while `Schlacht` is `battle` (literal translation is slaughter, btw) and `Schiff` is `ship`. If your are search for `ship` you want to find `battle ship`, too.

No, this is not the issue. The issue is what do "quoted words"  or even one "word" mean. 
 * a way to add white space into the search term
 * a way to define exact serach terms. I want exactly this "ship" and not "battleship". If I want to find both, I just search for ship.

I see no difference between "ship" and ship. The quotes are intended for quoting characters that have special meaning.

We can add ranking to the system, and this can solve the cases that are not clear. For example, if I get a relevance bar for each search result, lets say a system of stars `[*****]` for very relevant and empty `[     ]` for not relevant, I can show both results, but sort them in a useful way.

Example - search for "ship"

|| '''No''' || '''Rank''' || '''Found''' ||
|| 1 || `*****` || '''Ship''' ||
|| 2 || `***  ` || battle'''ship''' ||
|| 3 || `***  ` || other'''ship''' ||

We already have a kind of ranking system. By now it is quite simple. We can improve it in the 1.4 branch. I would like not to add further features to 1.3.

I would like to see this feature here (lets call it Google semantics for a parser that stole 80% of his ideas from Google): The parser tries to replace `"ham"` by `regex:\bham\b` if it makes sense.
''Why?'' quite easy - you do not really expect that someone tries to learn regex if he wants to do a phrase-search. ''It is not documented?'' Oh, easy to change.
''It is difficult to implement?'' Not really.
''It breaks the old search semantics!'' Yes, but I wonder if someone really uses the `""`-for-space-escaping-mode. If you really need `battle ship[s]`, then write a regex (as regexes should not be transformed like described above). -- AlexanderSchremmer <<DateTime(2004-11-29T22:27:09Z)>>

= Plan =

Add an examle that shows how "quoted words" finds also part of another word.

 * Priority: document this behavior before 1.3 released
 * Assigned to:
 * Status: closed, its a design which most developers see as correct. Updated docs to make it clear.

----

## When the bug is fixed, replace the category to Category MoinMoinBugFixed
## If this is not a bug, replace with Category MoinMoinNoBug
CategoryMoinMoinNoBug