Description

The function getPageText in Page.py mistakes when there is line begining with # inside a Preformated Text

Such line is recognized as a header... Observe lines 14 and 16 of the following code:

   1     def getPageText(self, start=0, length=None):
   2         """ Convenience function to get the page text, skipping the header
   3 
   4         @rtype: unicode
   5         @return: page text, excluding the header
   6         """
   7 
   8         # Lazy compile regex on first use. All instances share the
   9         # same regex, compiled once when the first call in an instance is done.
  10         if isinstance(self.__class__.header_re, (str, unicode)):
  11             self.__class__.header_re = re.compile(self.__class__.header_re, re.MULTILINE | re.UNICODE)
  12 
  13         body = self.get_raw_body() or ''
  14         header = self.header_re.search(body) ##<--- IT RECOGNISED A HEADER INSIDE A {{{ 
  15         if header:
  16             start += header.end() ##<--- THIS COUNT IS WRONG
  17 
  18         # Return length characters from start of text
  19         if length is None:
  20             return body[start:]
  21         else:
  22             return body[start:start + length]

Steps to reproduce

Example

Details

This wiki.

Workaround

Discussion

I don't see anything wrong with getPageText code, maybe the regular expression is wrong?

Reopen this bug currently I don't see what should be wrong. -- ReimarBauer 2007-05-20 18:11:50

That example page with slideshow shows a different problem title1 is missing -- ReimarBauer 2007-05-20 18:17:28

Plan

Priority:
Assigned to:
Status: the getPageText code is now quite different in 1.6 and 1.7 branches, so this is likely fixed. -- ThomasWaldmann 2008-05-01 23:24:32

CategoryMoinMoinBugFixed

MoinMoin: MoinMoinBugs/getPageTextFailsOnSomePres (last edited 2008-05-01 23:24:33 by ThomasWaldmann)