Description
The extended search based on Xapian does not work on Windows in moin-1.6.0.
Various windows-specific problems:
- Xapian indexing imports symlink which is not implemented on win32, therefore indexing fails.
- Also later calls os.utime on a * directory *, which is also not implemented under windows (see discussion below)
- Finally indexer fails with IOError: [Errno 24] Too many open files
Steps to reproduce
- Build Xapian core and python wrappers and install
- Run Xapian index with moin script
- "Cannot import symlink" error
- os.utime(self.dir, None) exception
- IOError: [Errno 24] Too many open files
Example
Component selection
- Xapian Enhanced Search (more specifically: moinmoin/support/xapwrap)
Details
MoinMoin Version |
1.6dev |
OS and Version |
XP |
Python Version |
2.5 |
Server Setup |
n/a |
Server Details |
n/a |
Language you are using the wiki in (set in the browser/UserPreferences) |
English |
Workaround
- ? simulate symlink like cygwin
- Does not work, see below.
- ? remove dependency for locking in Xapwrap index module
- Does not work, see below.
- Don't use utime on a directory - implement workaround
Where is "utime" used? I (DavidLinke) did not run into this.
- Close files which don't need to be open
- Move to posix os
Discussion
Moin IRC discussion today suggested recording bug at divmod for XapWrap, but this is no longer maintained and one developer said that they are now looking at PyLucene and fts2 (for SQLite).
Not ideal to have a dependency on obsolete code for a new Moin indexer, particularly when it constrains implementation to posix only.
Faster indexing is definitely required for larger wikis (mine has 17,000 + pages)
I have implemented a cygwin-like symlink and readlink to further test and I now find it falls over in the Index method of Xapian.pg on self.touch() where it is trying to touch the index directory with os.utime(self.dir, None).
Unfortunately os.utime does not seem to be implemented for directories on windows either.
Commented out self.touch() to test further and indexer fell over with IOError: [Errno 24] :
Traceback (most recent call last):
File "<string>", line 1, in <module> File "c:\python25\lib\site-packages\MoinMoin\script\moin.py", line 16, in run File "..\..\MoinMoin\script\init.py", line 138, in run File "..\..\MoinMoin\script\init.py", line 245, in mainloop File "..\..\MoinMoin\script\init.py", line 138, in run File "C:\python25\Lib\site-packages\MoinMoin\script\index\build.py", line 35, in mainloop File "C:\python25\Lib\site-packages\MoinMoin\script\index\build.py", line 42, in command File "C:\python25\Lib\site-packages\MoinMoin\search\builtin.py", line 262, in indexPages File "..\..\MoinMoin\search\Xapian.py", line 650, in _index_pages File "..\..\MoinMoin\search\Xapian.py", line 466, in _index_page File "..\..\MoinMoin\search\Xapian.py", line 395, in _get_languages File "C:\python25\Lib\site-packages\MoinMoin\Page.py", line 248, in get_pi File "C:\python25\Lib\site-packages\MoinMoin\Page.py", line 892, in parse_processing_instructions File "C:\python25\Lib\site-packages\MoinMoin\Page.py", line 209, in get_body File "C:\Python25\lib\codecs.py", line 817, in open
IOError: [Errno 24] Too many open files: 'd:\\DOCUME~1\\tijo2\\MYDOCU~1\\moin\\data\\pages\\datareports_entityreport_130567(2e)htm\\revisions\\00000001'
Will run again under debug to see if I can locate the files which are not being closed.
I (DavidLinke) have also tried to use Xapian-1.0.4 with moin-1.6.0 Release with Python 2.5.1 on Windows XP (server: Apache/CGI). Out of the box it does not work due to the problem with locking in xapwrap on Win32 mentioned above. I applied http://divmod.org/trac/ticket/504 - It did not help much. Just produced a different error:
C:\Home\wikis\TestWikiCGI>c:/sources/moin-1.6.0/moinmoin/script/moin.py --config -dir=c:/home/wikis/testwikicgi --wiki-url=paule/TestWikiCGI index build --mode a dd Traceback (most recent call last): File "c:\sources\moin-1.6.0\moinmoin\script\moin.py", line 24, in <module> run() File "c:\sources\moin-1.6.0\moinmoin\script\moin.py", line 15, in run MoinScript().run(showtime=0) File "C:\Sources\moin-1.6.0\MoinMoin\script\..\..\MoinMoin\script\__init__.py" , line 138, in run self.mainloop() File "C:\Sources\moin-1.6.0\MoinMoin\script\..\..\MoinMoin\script\__init__.py" , line 245, in mainloop plugin_class(args[2:], self.options).run() # all starts again there File "C:\Sources\moin-1.6.0\MoinMoin\script\..\..\MoinMoin\script\__init__.py" , line 138, in run self.mainloop() File "c:\sources\moin-1.6.0\moinmoin\script\..\..\MoinMoin\script\index\build. py", line 35, in mainloop self.command() File "c:\sources\moin-1.6.0\moinmoin\script\..\..\MoinMoin\script\index\build. py", line 42, in command Index(self.request).indexPages(self.files, self.options.mode) File "c:\sources\moin-1.6.0\MoinMoin\search\builtin.py", line 262, in indexPag es self._index_pages(request, files, mode) File "c:\sources\moin-1.6.0\moinmoin\script\..\..\MoinMoin\search\Xapian.py", line 657, in _index_pages writer.__del__() UnboundLocalError: local variable 'writer' referenced before assignment
BTW, Xapian search in roundup works nicely on Windows. However, Roundup does not use xapwrap on top of Xapian bindings.
Well, I needed to get this working for an Intranet site and did some hacking and it now works. I did not run into the "too many files error", but I did the others. Thanks to DavidLinke for the pointer to fix the symlink problem. The missing support for utime in Windows was another matter, but I was able to fix that as well. Not being familiar with how to submit patches here, I've attached a patch file to this page.
The patch has been committed to 1.6 by http://hg.moinmo.in/moin/1.6/rev/8864ee484084 and I moved the platform dependent filesystem code for touch to MoinMoin.util.filesys in http://hg.moinmo.in/moin/1.6/rev/e449b2ae18ba . I can't do any testing on win32, but the changes at least do not change behaviour on non-win32 systems.
Plan
- Priority:
- Assigned to:
- Status: hopefully fixed, see Discussion