Project Idea:

MoinMoin 1.9 already has searching, but it's rather slow. Also it can use Xapian(writing in C) via Xappy for improving search perfomance, but it has two disadvantages:

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. It not as fast as xapian, but has some useful features:

My goal is to add search function as in moin 1.9 with queries like http://moinmo.in/HelpOnSearching and shipping with moin2 which doesn't need any configuration instead.

Indexing:

Use whoosh.fields.Schema class for designing two indexing schemas looking like:

item_schema = Schema(item_name=TEXT(unique=True,stored=True),
uuid=ID(unique=True), datetime=DATETIME, content=TEXT(stored=False),
mimetype=TEXT(stored=True), tags=KEYWORD(stored=True), acl=TEXT(stored=True), metadata=TEXT(stored=True)
)

revisions_schema = Schema(item_name=TEXT(stored=True),
uuid=ID, rev_no=NUMERIC(stored=True), datetime=DATETIME,
content=TEXT(stored=False), mimetype=TEXT(stored=True),
tags=KEYWORD(stored=True), metadata=TEXT(stored=True)
)

Where item_schema contains only current revision of all documents and revisions_schema contains all revisions of all documents. Content of documents will be stored in SQLAlchemy, so we can easily find it by name/rev_no

Searching:

For parsing search quieries I will be use whoosh.qparser.QueryParser (for parsing simple requests like "this is a search query") and whoosh.qparser.MultifieldParser(for parsing quieries requires searching in multiple fields like "mimetype:text\html NO *xapian help")

For searching by regular expression I will use qparser.CompoundsPlugin which allow to use typographic symbols instead of words for the AND, OR, ANDNOT, ANDMAYBE, and NOT functions.

For example:

qparser.CompoundsPlugin(And="&", Or="\\|", AndNot="&!", AndMaybe="&~")

http://packages.python.org/Whoosh/parsing.html#changing-the-and-or-andnot-andmaybe-and-not-tokens

Also there will be two different ways for searching:

1) Via simple search box at the top of page

2) Via query constructor template for extended searching

Ideas for Whoosh Indexing / Searching Project

See: http://etherpad.osuosl.org/whoosh-moin


CategoryGsocProject

MoinMoin: WhooshSearch2011 (last edited 2012-06-01 08:11:28 by EugeneSyromyatnikov)