Improving Xapian search

The aim of the project is to show search results in a better way. This includes designing an output layout (how the result are shown) and grouping search results (show top level pages before sub pages, attachments).

Repository

http://hg.moinmo.in/moin/1.9-xapian-dmilajevs/

Kinds of TODOs:

(!) This text contains links to the source code. The latest available revision on the moment of writing is used. 4b2ef153ad4f

xapwrap to xappy migration

Xapian searching was done using xapwrap library. It is not supported anymore hence xapwrap was replaced by xappy.

Indexing

Every indexed document has a field structure, which is defined in the MoinIndexerConnection class. STORE_CONTENT is required for regexp searching and document deletion from the index.

Moin indexes 3 kinds of documents: files, attachments and pages. Every document has unique id value. Different kinds of documents have different id structure.

Title and content fields are tokenized and optionally stemmed. Tokenization is needed for queries like HelpOn where a user expects to get help pages (HelpOnSearching, HelpOnEditing, HelpOnAcl, etc). In the search index WikiWords are transformed to wikiwords wiki words or to wikiwords wikiword wiki words word if stemming is enabled. Analyzed strings are in lowercase because xapian expects lowercase input for stemming. Thus in the index both values are stored original and tokenized. Tokenization and stemming is done by the WikiAnalyzer

Documents are removed from the index or by the document id, or by fulltitle value.

Searching

Queries for xapian are built by xapian_term() in MoinMoin.search.queryparser.expressions. For the regexp based queries it checks every document in the index and query only those for which regexp has found matches. For other queries, appropriate fields are queried.

Plan

Testing

Problems

Xapian2009/Problems

Diary

<< <  2024 / 9 >  >>
Mon Tue Wed Thu Fri Sat Sun
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

CategoryGsocProject

MoinMoin: Xapian2009 (last edited 2012-06-01 08:03:48 by EugeneSyromyatnikov)