Description
The function getPageText in Page.py mistakes when there is line begining with # inside a Preformated Text
Such line is recognized as a header... Observe lines 14 and 16 of the following code:
1 def getPageText(self, start=0, length=None):
2 """ Convenience function to get the page text, skipping the header
3
4 @rtype: unicode
5 @return: page text, excluding the header
6 """
7
8 # Lazy compile regex on first use. All instances share the
9 # same regex, compiled once when the first call in an instance is done.
10 if isinstance(self.__class__.header_re, (str, unicode)):
11 self.__class__.header_re = re.compile(self.__class__.header_re, re.MULTILINE | re.UNICODE)
12
13 body = self.get_raw_body() or ''
14 header = self.header_re.search(body) ##<--- IT RECOGNISED A HEADER INSIDE A {{{
15 if header:
16 start += header.end() ##<--- THIS COUNT IS WRONG
17
18 # Return length characters from start of text
19 if length is None:
20 return body[start:]
21 else:
22 return body[start:start + length]
Steps to reproduce
Example
http://moinmoin.wikiwikiweb.de/MoinMoinBugs/getPageTextFailsOnSomePres/Test?action=raw
http://moinmoin.wikiwikiweb.de/MoinMoinBugs/getPageTextFailsOnSomePres/Test?action=Slideshow
Details
This wiki.
Workaround
Discussion
I don't see anything wrong with getPageText code, maybe the regular expression is wrong?
Reopen this bug currently I don't see what should be wrong. -- ReimarBauer 2007-05-20 18:11:50
That example page with slideshow shows a different problem title1 is missing -- ReimarBauer 2007-05-20 18:17:28
Plan
- Priority:
- Assigned to:
Status: the getPageText code is now quite different in 1.6 and 1.7 branches, so this is likely fixed. -- ThomasWaldmann 2008-05-01 23:24:32