HtmlConverter/Typo3-2Moin

Contents

Typo3 2 MoinMoin Wiki

Typo3 2 MoinMoin Wiki

Because the Typo3 CMS (Content Management System) is more and more difficult to maintain, I wanted to move my most important model aeroplane web pages to my https://moinmo.in/ wiki. OK, they are produced in the time range from 2005 to 2012, but some informations are still valid today.

Fortunately I got help from Reimar Bauer's software Coconuts documentation. Thank you very much.

Coconuts is an open source software library in Python which interacts with a MoinMoin wiki. It can be used to create snapshots of given URLs. This makes refactoring of existing web pages much easier.

What Typo3 web page content is transfered?

the body text of the Typo3 web page with markup
the full size pictures, stored as moin wiki attachments (recommended max. width: 800 pixel)
- from Typo3 path uploads/pics/
- the pictures get links in the moin text body with down scaling parameter (default width: 320 pixel)

$/!\$ A mouse click on a source program will show the listing in a new window with colour coding.

Installation

See Coconuts install.

Install moinmoin wiki version 1.9.9 if not already done.
Download the newest Coconuts .gz archiv and expand it.

# Terminal commands for Ubuntu 16.04
$ mkdir Install  # if not yet made
$ cd Install

# download archiv
$ wget https://coconuts.icg.kfa-juelich.de/hg/coconuts-0.5/archive/tip.tar.gz

# extract archiv
$ tar -xzf tip.tar.gz

$ cd coconuts-0-5-66bb61002551

# I do not want to install as a Python dist-package for easier access.

A Python version 2.7 must be installed.
Install the Python Image Library:

$ sudo pip2 install pillow

Install parser package Arnica from Reimar Bauer
Install web browser Firefox (version 57 upwards).
- Install Pearl Crescent Page Save extension for Firefox (version 57 upwards) pagesaver.
  - In parameters setup in field File name pattern: %t_%5
Comment in file MoinMoin/config/multiconfig.py in order to enable "xmlrpc" (Remote Procedure Call)
- Edit line 796 [#'xmlrpc' ....
- Restart moinmoin wiki (e.g. touch moin-1.9.9/moin.wsgi)
- otherwise you get an error message: Fault: <Fault 1: 'No such method: getAuthToken_repr.'>
Install HTML formatter library:

$ sudo apt-get install libtidy-0.99-0

Setup

File settings_local.py (excerpt) is used to avoid editing of the file settings.py:

# declare the path to the MoinMoin installation
import sys
#sys.path.insert(1, 'path_where_you_have/moin-1.9')
sys.path.insert(1, '/home/rudi/moin-1.9.9') # for example

    # a list of web pages to transfer, e.g.:
    urls = ["http://www.rudiswiki.de/cms/index.php?id=unsere-flugfelder",
    #"http://www.rudiswiki.de/cms/index.php?id=dieakkus",
    ]

    # regex for extracting TYPO3 body
    extract_regex = re.compile(ur'<!--TYPO3SEARCH_begin(?P<value>.*)TYPO3SEARCH_end-->',
                        re.MULTILINE | re.UNICODE | re.IGNORECASE | re.DOTALL)
    # regex for extracting pictures
    img_regex = re.compile(ur'pics%2F(.+?)(?=&)', re.MULTILINE | re.UNICODE)
    # the picture links will be appended at the end of the text

    # login credentials for the target MoinMoin wiki
    wikiurl = "http://localhost:8080"
    username = "<admin name>"
    password = "<password>"
    pagename = u'RudisFlugis' # should be defined in file html2moin.py

The file html2wiki.py needs to be adopted to the MoinMoin wiki layout (excerpt):

            ...
            pagename = '/'.join(names)
            # adopt wiki page name prefix
            pos = pagename.find("=")                    # RR
            pagename = pagename[pos+1:]                 # RR, extract real pagename
            pagename = "RudisFlugis" + pagename         # RR, add prefix
            # structure your wiki page names with the page prefix
            #pagename = "RudisFlugisModell" + pagename  # RR, add prefix
            ...
            try:
                successful, wiki_text = moinmoin.convert_in_moin_markup(parser.title, html_fragment)
            except (expat.ExpatError, TidyLibError), err:
                logging.debug("%s %s" % (link, err))
                continue
            # Add moin wiki page header to body text
            t1 = '#format wiki\n'
            t2 = '#language  en\n'
            t3 = '||<tablestyle="float: right; margin: 0px;"style="padding: 0.5em; '
            t4 = 'border: 0px none; font-size: 100%;"><<TableOfContents>> ||\n'
            wiki_text = t1 + t2 + t3 + t4 + wiki_text + '\n\n'

            # get picture file names list
            lstPics = parser.image_urls

            # add picture attachment link list to wiki text
            t1 = '||<tablestyle="float: right;">[[attachment:'
            t2 = '|{{attachment:'
            t3 = '|attachment:'
            t4 = '|width="320"}}]] ||\n'
            for i in range(len(lstPics)):  # RR build attachment refs
                pic = lstPics[i]
                wiki_text = wiki_text + t1 + pic + t2 + pic + t3 + pic + t4

            # add wiki footer to wiki text
            t1 = '=== Contact Email ===\n'
            t2 = 'Please enter your Email address, if you expect an answer:\n\n'
            t3 = "/!\ '''The entered Email address will NOT be published, or distributed.'''\n"
            t4 = '<<AddComment>>\n\n'  # See at the Links for the macro software
            t5 = '=== List of pages in this Category ===\n'
            t6 = '<<FullSearch(category:CategoryRudisFlugis)>>\n\n'
            t7 = '----\n'
            t8 = 'Go back to CategoryRudisFlugis or FrontPage\n'
            wiki_text = wiki_text + t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8

            # send wiki text to moin wiki via XMLRPC (Remote Procedure Call)
            self.xmlrpc.send_page(pagename, wiki_text)

            # prepare list of picture URL's
            for i in range(len(lstPics)):  # build URL
                lstPics[i] = "http://192.168.17.72/cms/uploads/pics/" + lstPics[i]

            # Send pictures to moin wiki as an attachment
            self.images_send(lstPics, parser.encoding, pagename, "")
            ...

See the complete file with debug helps html2wiki.py

Lower the heading size by one: The main work of the HTML to moin wiki markup converting is done in the file text_html_text_moin_wiki.py in the moin wiki. For me the heading size conversion of HTML h1 to moin = seems to be too big. So, I lowered that by one size step with the following source code change:

# Edit file: MoinMoin/converter/text_html_text_moin_wiki.py
Line: 580: hstr = "=" * depth
to
           hstr = "=" * (depth+1) # RR, lower heading size

web page transfer

If all software is installed and setup, you can start transfering web pages from the Typo3 CMS to the moin wiki:

Start at the beginning with one web page only in the URL list:

$ cd <path to coconuts>

$ python html2wiki.py
...
2018-03-04 05:30:58,014 INFO html2wiki.py:116 start
2018-03-04 05:31:03,795 INFO html2wiki.py:122 done

Now you have a new wiki page in your moin wiki, and the pictures are all copied to the attachment folder of this wiki page.

Next you should restore your page layout. The picture links are at the end of the web page and should be moved to the corresponding text place on the page.

See an original Typo3 sample web page: Unsere-Flugfelder or Unsere-Flugfelder.

Then the raw transported web page Unsere-Flugfelder.

Last the version with restored layout Unsere-Flugfelder.

MoinMoin: HtmlConverter/Typo3-2Moin

Typo3 2 MoinMoin Wiki

Installation

Setup

web page transfer

Links