Description

moin still relies on the PyXML software, which has seen it's latest release year's ago. please replace this dependency by using the standard python xml library, or some maintained library like lxml or 4suite.

Component selection

general

Details

MoinMoin Version
OS and Version
Python Version
Server Setup
Server Details
Language you are using the wiki in (set in the browser/UserPreferences)

Workaround

Discussion

DocBook formatter requires python-xml, even under python 2.5 (see MoinMoinDependencies and MoinMoin/formatter/text_docbook.py )

Here is an early patch: no-pythonxml.patch, based on Stefano Zacchiroli's patch. Your feedback is welcome -- FranklinPiat 2009-10-08 21:36:58 :

Thanks for the patch, but can you please tell what the state of affairs is after your patch? What works, what not? -- ThomasWaldmann 2009-10-08 22:17:00

Those are tracked below. -- FranklinPiat 2009-10-11 15:21:05

Known bugs/problems

XML parsers refuse to parse plain text

Bug: Some macros, like <<Hits>> produces plain text (i.e without any markup language). This bug is fixed in the patch below.
The DOM tools gets mad when they have to merge the DOM generated by the result of the macro, with the main DOM of the page.
```
 1 import Ft.Xml.Domlette
 2 st="foo"
 3 Ft.Xml.Domlette.Print(Ft.Xml.Domlette.NonvalidatingReader.parseString(st))
```
fails with:
```
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/site-packages/Ft/Xml/Domlette.py", line 167, in parseString
 return self.parse(isrc)
 File "/usr/lib/python2.5/site-packages/Ft/Xml/Domlette.py", line 151, in parse
 return self.parseMethod(inputSource, *self.args, **self.kwargs)
Ft.Xml.ReaderException: In urn:uuid:4e04d22c-8cc4-46ab-b9c6-4e22c40f9e30, line 1, column 0: syntax error
```
but it works if we set st="foo"
Same for xml.minidom:
```
 1 import xml.dom.minidom
 2 st="foo"
 3 Ft.Xml.Domlette.Print(Ft.Xml.Domlette.NonvalidatingReader.parseString(st))
```
which fails with:
```
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString
 return expatbuilder.parseString(string)
 File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString
 return builder.parseString(string)
 File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString
 parser.Parse(string, True)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0
```
but it works if we set st="foo"
As a workaround, it is possible to detect when a simple string is returned, and embedded inside ..., so minidom/domlette are happy.

Dropping 4suite ?

Since python "now" have a proper xml implementations we might want to drop 4suite's Ft.Xml.Domlette, and use xml.minidom instead.

However, minidom requires a valid XML file as input. (But moinmoin currently produces HTML4 layout, not XHTML : some macro don't close paragraph tags ...). As a workaround, we can sanitize generated code with BeautifulSoup.

See the *no-4suite*.patch patches above.

The DocBook files generated by both parsers are very similar, see diff

Entities

At some stage I had the impression that <<RandomQuote>> returned html entities that weren't handled properly. This needs to be tested (confirmed).

Patches

There are two series of patch, exploring two alternatives:

1. Using 4suite

This patch attempts to use the current dependencies (python 4suite).

v1.1a -- no-pythonxml_with_4suite_v1.1a.patch

Bug: This patch doesn't work for pages that contains any macro.

This is because the macro are generating a chunk of HTML, that is later merged in the main DOM, but I can't get 4suite to merge those two. it typically dies with: .

  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 1329, in _macro_repl
    return self.formatter.macro(self.macro, macro_name, macro_args, markup=groups.get('macro'))
  File "/usr/lib/pymodules/python2.5/MoinMoin/formatter/text_docbook.py", line 638, in macro
    self._copyExternalNodes(xml_dom, exclude=excludes)
  File "/usr/lib/pymodules/python2.5/MoinMoin/formatter/text_docbook.py", line 657, in _copyExternalNodes
    target.appendChild(self.doc.importNode(node, deep))
  File "/usr/lib/python2.5/xml/dom/minidom.py", line 1737, in importNode
    return _clone_node(node, deep, self)
  File "/usr/lib/python2.5/xml/dom/minidom.py", line 1814, in _clone_node
    if node.ownerDocument.isSameNode(newOwnerDocument):
TypeError: isSameNode() argument 1 must be Ft.Xml.cDomlette.Node, not instance

2. Using minidom + BeautifulSoup

Python>=2.4 (?) has XML/DOM implementation. So it should be possible to

no-pythonxml_with_minidom+BeautifulSoup_v1.patch

Bug: the macro <<RandomQuote>> sometimes causes error.

The !RandomQuote macro needs to be fixed or blacklisted. it can produce the following xml/error: .

<!--The macro RandomQuote caused an error and should be blacklisted. It returned the data '<div dir="ltr" id="FortuneCookies-1.RandomQuote" lang="en"><span class="anchor" id="FortuneCookies-1.top"></span>
<span class="anchor" id="FortuneCookies-1.line-1"></span><p class="line862">Hint: Set your pages language with <tt>#language&nbsp;en</tt> processing instruction. See also <a href="/HelpOnLanguages">HelpOnLanguages</a>. <span class="anchor" id="FortuneCookies-1.bottom"></span></div>' which caused the docbook-formatter to choke. Please file a bug.-->

whereas other quotes are ok:

<para> end  <span class="anchor" id="FortuneCookies-1.top"/>
<span class="anchor" id="FortuneCookies-1.line-1"/><p class="line862">Hint: Search for multiple words, just like Google. See also <a href="/HelpOnSearching">HelpOnSearching</a>. <span class="anchor" id="FortuneCookies-1.bottom"/></p> </para>

Bug: the parser fails if one plays badly with .
The !RandomQuote macro needs to be fixed or blacklisted. it can produce the following xml/error: .

Bug: some <<Include(...)>> fails.

example .

<<Include(^HelpOnMacros/.*,, items=7, titlesonly)>>

and . ~-

<<Include(^HelpOn*)>>

With error: . ~-

2009-10-11 17:09:51,211 ERROR MoinMoin.macro:130 Macro Include raised an exception:
Traceback (most recent call last):
  File "/usr/lib/pymodules/python2.5/MoinMoin/macro/__init__.py", line 124, in execute
    return execute(self, args)
  File "/usr/lib/pymodules/python2.5/MoinMoin/macro/Include.py", line 209, in execute
    count_hit=False)
  File "/usr/lib/pymodules/python2.5/MoinMoin/Page.py", line 1197, in send_page
    start_line=pi['lines'])
  File "/usr/lib/pymodules/python2.5/MoinMoin/Page.py", line 1281, in send_page_content
    self.format(parser)
  File "/usr/lib/pymodules/python2.5/MoinMoin/Page.py", line 1302, in format
    parser.format(self.formatter)
  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 1548, in format
    formatted_line = self.scan(line, inhibit_p=inhibit_p)
  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 1362, in scan
    result.append(self.replace(match, inhibit_p))
  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 1406, in replace
    result.append(replace_func(hit, match.groupdict()))
  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 972, in _dl_repl
    self._close_item(result)
  File "/usr/lib/pymodules/python2.5/MoinMoin/parser/text_moin_wiki.py", line 451, in _close_item
    result.append(self.formatter.definition_desc(0))
  File "/usr/lib/pymodules/python2.5/MoinMoin/formatter/text_docbook.py", line 329, in definition_desc
    while self.cur.nodeName != "glosslist":
AttributeError: 'NoneType' object has no attribute 'nodeName'

3. Using minidom only

The above patch uses BeautifulSoup to cleanup the xml code generated by the script itself (!).

No patch at the moment

Plan

Priority:
Assigned to:
Status: you can use a recent version of python and the builtin xml libs (e.g. 2.5.2) and it should work. for older pythons (2.3, 2.4), you need pyxml. (DocBook export needs pyxml, especially if you use Macro)
Franklin : As of 2009-11, I gave up working on this issue (mostly because the DOM tree will be refactored in moin-2.0, so it would require some work again).

CategoryMoinMoinNoBug

MoinMoin: MoinMoinBugs/DependencyOnOrphanedPythonXML (last edited 2010-02-04 23:25:37 by 241)