Xapian and MimeType Search with FindPage
I'm using the Xapian search with pystemmer enabled and I like to search for mimetypes (with FindPage). But there are a few "problems".
Tested on moin/1.6/rev/1829d890e862 , but I think it's also valid for 1.7dev.
Grouped MimeType Search from the FindPage form
A search for only the mimetype "application" doesn't work (with xapian/pystemer). but it should (this is also a bug report; MoinMoinBugs/XapianFindPageFormWorksOnlyForFullMimeType, because the form FindPage offer this possibility). fixed!
But now my wishes:
the text should be language specific and not only english
the mimetype groups should be configurable or a bit optimized, because that a pdf-document counts to application is not very understandable for a normal user. It should look more like: documents [doc, odt, odf, odp, docx, xls, pdf, ppt,... ], application [exe,sh,..], archiv-files [tar,zip,bz2,..], images [png,jpg,gif,..], programming [py, c, h, ..] etcetera.
Finding the right MimeType is long winded with the FindPage form
If you want to find the correct mimetype e.g. for adobe pdf files - for what you looking for, adobe? pdf? No! You not even can type the first few letters of the wanted word (like a-d-o or p-d-f). the goal is to scroll down (a only three line field) ~400 entries and to looking for *.pdf - application/pdf . this is not very userfriendly.
the list should be a drow down with possiblity to type the first letter (e.g. searching for pdf you would type p d and find the correct file. So a listening should maybe look like:
<option value="application/pdf">pdf - application/pdf</option>
Finding only Wikipages with PageList or FindPage is complicated
Since the Update from Thomas Waldmann its possible to search only for wiki pages (with the formater "wiki"); like
<<PageList(python mimetype:text/wiki)>>
or also in the FindPage you can choose the correct mimetype. The link would look then something like:
http://moinmo.in/FindPage?action=fullsearch&advancedsearch=1&and_terms=moin&or_terms=¬_terms=&mtime=&categories=&language=&mimetype=text/wiki
But for an End-User it would be userfriendly if he can choose with a checkbox "no attachments" or "only wiki pages" (and this would search for only text/wiki).
just implemented some ugly solution for a automatic search with "mimetype=text/wiki" only. see here ActionMarket/FullSearchMimeTypeSupport.
Mimetype text/wiki vs. text/plain
There are maybe some future problems:
- only search for text/wiki don't work if a user chose another formatter for a wikipage e.g. xml = text/xml.
if you search only for the "group" mimetype "text" then also some textfiles (text/plain) would be included (*.txt); so maybe the mimetyp text/wiki is not very "bright"; for pages it should maybe more look like wiki/<formatter> or moin/<formatter>
this are just my 5cents; maybe with some better knowledge about the future "attachment" and storage framework there are other/better ideas and solutions available.
-- MarcelHäfner 2008-05-26 20:33:10