Project Idea:

MoinMoin 1.9 already has searching, but it's rather slow. Also it can use Xapian(writing in C) via Xappy for improving search perfomance, but it has two disadvantages: - Binary dependences - Not easy to configure Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. It not as fast as xapian, but has some useful features:

- Whoosh can be shipped with moin2 without any dependencies

- Using of whoosh makes the code much simpler

- Behaviour of the search won't depend on the configuration chosen by the admin who installed moin

My goal is to add search function as in moin 1.9 with queries like http://moinmo.in/HelpOnSearching and shipping with moin2 which doesn't need any configuration instead.

Indexing:

Use whoosh.fields.Schema class for designing two indexing schemas looking like:

item_schema = Schema(item_name=TEXT(unique=True,stored=True),
uuid=ID(unique=True), datetime=DATETIME, content=TEXT(stored=False),
mimetype=TEXT(stored=True), tags=KEYWORD(stored=True), acl=TEXT(stored=True), metadata=TEXT(stored=True)
)

revisions_schema = Schema(item_name=TEXT(stored=True),
uuid=ID, rev_no=NUMERIC(stored=True), datetime=DATETIME,
content=TEXT(stored=False), mimetype=TEXT(stored=True),
tags=KEYWORD(stored=True), metadata=TEXT(stored=True)
)

Where item_schema contains only current revision of all documents and revisions_schema contains all revisions of all documents. Content of documents will be stored in SQLAlchemy, so we can easily find it by name/rev_no

Searching:

For parsing search quieries I will be use whoosh.qparser.QueryParser (for parsing simple requests like "this is a search query") and whoosh.qparser.MultifieldParser(for parsing quieries requires searching in multiple fields like "mimetype:text\html NO *xapian help")

For searching by regular expression I will use qparser.CompoundsPlugin which allow to use typographic symbols instead of words for the AND, OR, ANDNOT, ANDMAYBE, and NOT functions.

For example:

qparser.CompoundsPlugin(And="&", Or="\\|", AndNot="&!", AndMaybe="&~")

http://packages.python.org/Whoosh/parsing.html#changing-the-and-or-andnot-andmaybe-and-not-tokens

Also there will be two different ways for searching:

1) Via simple search box at the top of page

2) Via query constructor template for extended searching


CategoryGsocProject

MoinMoin: MichaelMayorov/GSoC2011 (last edited 2012-06-01 08:08:00 by EugeneSyromyatnikov)