Indexing Filters for Xapian-based search

Just wanted to do a quick poll about what moin users out there use for indexing their file attachments.

1. Filters used

Do you just use the filter modules we provide (see MoinMoin/filter/*.py) or do you use additional python filtering / filtering adaptor code (which)?

2. Filter programs quality / stability

With the questions below, I am searching for practical experience relating to:

And also other stuff you might want to tell about it.

(!) If you do not use some specific filter due to a missing use case or because you don't have required stuff installed, you don't need to tell about it. But, if you do not use it because it was always making trouble for you, please DO tell.

2.1. PDF (using pdftotext from poppler-utils)

Is our PDF filter plugin that calls pdftotext (from poppler-utils) working for you? If you do not use poppler-utils, but xpdf-utils, see below.

2.2. PDF (using pdftotext from xpdf-utils)

Is our PDF filter plugin that calls pdftotext (from xpdf-utils) working for you? If you do not use xpdf-utils, but poppler-utils, see above.

2.3. RTF

Is our RTF filter plugin that calls catdoc working for you?

2.4. MS Word

Is our MS word filter plugin that calls antiword working for you?

2.5. MS Excel

Is our MS excel filter plugin that calls xls2csv (from catdoc package) working for you?

2.6. MS Powerpoint

Is our MS powerpoint filter plugin that calls catppt (from catdoc package) working for you?

(!) This is very new and was committed to 1.8 repo after 1.8.5 release. Feedback about catppt in general is also welcome.

2.7. OpenOffice.org / Open Document Format

Is our builtin OOo / ODF filter plugin working for you?

2.8. text/html, text/xml

Is our builtin html and xml filter plugin working for you?

2.9. text/*

Is our builtin text filter plugin working for you?

2.10. JPEG images

Is our builtin image/jpeg filter plugin working for you?

2.11. Binary

Is our builtin binary file filter plugin working for you?

MoinMoin: PollAboutXapianSearchIndexingFilters (last edited 2012-05-11 10:28:45 by ThomasWaldmann)