HTML2Moin

Some people want to transfer HTML-files - not only as a attachment but as real Wiki-pages. In this cases a converter that turns HTML into wiki-markup makes life easy. OpenOffice.org and Word-files can be saved as HTML and simplified with HtmlTidy (that even can "strip surplus tags in Word 2000 pages").

In Python there are two different htmlparsers that could be useful:

HTMLParser can be used like this:

   1 #!/usr/bin/python2.3
   2 
   3 url = "http://jurawiki.org/StartSeite?action=print"  # sample url
   4 
   5 import urllib
   6 htmldata = urllib.urlopen(url).read()
   7 
   8 from HTMLParser import HTMLParser
   9 
  10 verbose = 0
  11 
  12 class MyHTMLParser(HTMLParser):
  13     def do_a_start(self, attrs, tag):
  14         print " %s " % attrs[0][1]
  15 
  16     def do_b_start(self, attrs, tag):
  17         print "'''",
  18 
  19     def do_b_end(self, tag):
  20         print "'''",
  21 
  22     def do_em_start(self, attrs, tag):
  23         print "''",
  24 
  25     def do_em_end(self, tag):
  26         print "''",
  27 
  28     def handle_starttag(self, tag, attrs):
  29         func = MyHTMLParser.__dict__.get("do_%s_start" % tag, MyHTMLParser.do_default_start)
  30         return func(self, attrs, tag)
  31 
  32     def handle_endtag(self, tag):
  33         func = MyHTMLParser.__dict__.get("do_%s_end" % tag, MyHTMLParser.do_default_end)
  34         return func(self, tag)
  35 
  36     def handle_data(self, data):
  37         print data,
  38 
  39     def do_default_start(self, attrs, tag):
  40         if verbose:
  41             print "Encountered the beginning of a %s tag" % tag
  42             print "Attribs: %s" % attrs
  43 
  44     def do_default_end(self, tag):
  45         if verbose:
  46             print "Encountered the end of a %s tag" % tag
  47 
  48 p = MyHTMLParser()
  49 p.feed(htmldata)
  50 p.close()

FlorianFesti extended this parser from ThomasWaldmann a bit (see ActionMarket/HTML2MoinMoin.py) and made a nice Web-Interface (see ActionMarket/ImportHtml.py). A sample implementation can be seen in the ZDI-Wiki.

This doesn't work well, and on errors shows the full file paths to python and your moin installation. Even the simple Google page isn't converted right.
- Which one are you talking about? The one above (which is more a "It could be done like that" or the action? It is clear that both don't work well on pages with forms as forms are not supported by MoinMoin?
The ZDI-Wiki says "Unknown Action". The ImportHtml.py shows error messages on my installation (that doesn't mean much as I've been playing around for just some hours on a WinXP / Apache 2.0.53 / Python2.4 / MoinMoin1.3.3 test installation. Maybe other things are wrong)
Is there a difference between this nice Web-Interface and attachment:example? I do not see.

Other existing HTML2Wiki Converters

http://savannah.nongnu.org/projects/html2wiki/ - this project seems to have been abandoned, can't find any downloads, last news are from 2003 and links are broken
http://tools.waglo.com/html2moinmoin/ - another converter, written in PHP
Convert::Wiki - Convert HTML/POD/txt from/to Wiki code (Perl)
HTML::WikiConverter - An HTML to wiki markup converter (Perl)
HtmlConverter/Typo3-2Moin - An Typo3 (HTML) to wiki markup converter with picture transfer (Python)

using xslt

convert your html to xhtml using http://tidy.sf.net
apply the folowing stylesheet:

<?xml version="1.0" encoding="Windows-1251"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:output method="text" encoding="Windows-1251"/>
<xsl:template match="/">
        <xsl:apply-templates/>
</xsl:template>

<xsl:template match="h4">
  === <xsl:apply-templates /> ===
</xsl:template>

<xsl:template match="p|pre">
        <xsl:text xml:space="preserve">
</xsl:text>
        <xsl:apply-templates/>
        <xsl:text xml:space="preserve">
</xsl:text>
</xsl:template>

<xsl:template match="u"> __<xsl:apply-templates/>__</xsl:template>

<xsl:template match="em"> *<xsl:apply-templates/>*</xsl:template>

<xsl:template match="li[parent::ul]">
<xsl:text disable-output-escaping="yes">
    * </xsl:text> <xsl:apply-templates/>
</xsl:template>

<xsl:template match="li[parent::ol]">
<xsl:text xml:space="preserve">
    </xsl:text>
<xsl:value-of select="position()"/>.<xsl:text xml:space="preserve"> </xsl:text>
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="*">
<xsl:text xml:space="preserve"> </xsl:text><xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>

stylesheet can be enriched with other tags

coconuts uses text_html_text_moin_wiki to do a mass convert of html fragments of our old CMS pages into wiki markup and sends it by xmlrpc to a wiki server. It is in development process. -- ReimarBauer 2010-03-02 19:29:06

MoinMoin: HtmlConverter

HTML2Moin

Other existing HTML2Wiki Converters

using xslt