Word2Moin
About
This is a Visual Basic script that converts Microsoft Word 2000 documents to MoinMoin markup. It is based on swythan's WordToWiki macro for TikiWiki (http://tikiwiki.org/tiki-index.php?page=WordToWiki_swythan).
I (JohnWhitlock) created this code because I had a dozen Word documents, with tables, lists, headings, etc., that I wanted to convert to Moin pages for our intranet Wiki. Manual conversion took about 8 hours for a complex, 50 page document. With this script, it took about 2 hours, mostly to fix tables, lists, and extract images. Now, I'm too busy doing his job to make the script any better, but I hope it is useful for someone else in its current ugly form.
Installation
- Download the macro (see below).
- Start Microsoft Word.
Open the Visual Basic Editor (Tools > Macro > Visual Basic Editor)
Click the Normal heading, so that the macro is installed and available for all Word documents.
Install the macro (Visual Basic Editor: File > Import File...) into the Normal template
How To Use
- Open the document you want to convert.
Within Word: from the 'tools' menu, select Macro > Macros...
Select Word2Moin, then click 'Run'
- The document will be converted in place, and copied to the clipboard.
- Paste the results in the wiki editor window (Ctrl-V).
- If you have many images in your Word doc, see section below for tips.
Exporting images from Word Docs
If you have documentation in Word that has many images or diagrams, a faster way to re-size and convert all your images at once (to .png, .gif, or .jpg files) is to leverage Word's "Save as Web Page..." capability.
- Open the document you want to convert (you kept a backup, right?!).
- From the 'File' menu, select "Save as web page..."
Important: In the save dialog, set the "Save as type" to "Web page (*.htm; *.html)" (the default *.mht is a single file, you can't pull the images).
Exit word and locate your save location. You will see a example-document.html and a folder named example-document_files (where example-document is the name of your document). You can delete the example-document.html file, we don't care about that, the images you need are in the folder.
Upload the graphic files to your wiki page, and use the attachment: tag to place them on your wiki page.
Tip: To enable exporting to PNG:
Word: Tools > Options... > 'General' tab > 'Web Options...' button > check "Allow PNG for graphics format"
Downloads
download |
Moin Version |
1.6 and newer |
|
1.5 and prior |
|
1.5 and prior |
What Works
- v2.0: Multi-level lists are converted properly.
- v2.0: Letter lists (a,b,c) and roman numeral lists are converted properly.
- v2.0: Empty table cells are converted properly.
- Converts the Word Table Of Contents (field "TOC") into a Moin table with inter-document links
- Converts Word Headings into Moin Headlines
Inserts [[Anchor()]] macro and section number, if TOC was found
- Converts Bold, Italic, Underlined, Superscript, and Subscript to Moin equivalents
- Converts Lists to Moin lists
- Converts Tabs to Moin tables
- Converts Tables to Moin Tables, including background color and justification
- Replaces page breaks with Moin line rules
- Separates paragraphs with extra line breaks (Moin paragraph format)
- Copies the results to the clipboard
What Doesn't Work
- Sometimes the Word justification doesn't make sense. The user has to fix some cells or rows to make the Moin table look like the Word table.
- There is no attempt to turn Word merged table cells into Moin spanned cells. The user has to do this manually, if desired.
Heading numbers - Sometimes, the algorithm misses a section, so the Table of Contents doesn't work for that section. The user has to manually add the number, [[Anchor()]] macro.
Section names - A Word section break sometimes appears in the Table Of Contents, but no Moin heading is created. The user has to manually add the Moin heading, [[Anchor()]] macro.
- There is no support for text colors in Moin, which may make some tables look bad, with black text on a dark background.
- Word uses special characters for dashes, elipses, and left/right quote marks. Many browsers can display these correctly, but it might not pass an HTML validation test. To get these special characters converted, save the script-converted document as a plain text file, and make Word do the work.
Images, diagrams, etc., are not automatically exported - See Exporting images from Word Docs section above for a simple work-around.
Issues/Workarounds
Comment Lines Between Table Rows
Version 2 generates comment lines (lines beginning with ##) between table rows to make tables visually easier to edit in text mode. However, some Wiki flavors may have issues with these comment lines. If you experience problems, when the macro generates the table ML like this:
## ####################################### ||<v>column 1||<v>column 2||<v>column 3 || ## ####################################### ||<v>a ||<v>d ||<v>g || ## ####################################### ||<v>b ||<v>e ||<v>h || ## ####################################### ||<v>c ||<v>f ||<v>i || ## #######################################
simply take out the comment lines (the lines beginning in ##) like so:
||<v>column 1||<v>column 2||<v>column 3 || ||<v>a ||<v>d ||<v>g || ||<v>b ||<v>e ||<v>h || ||<v>c ||<v>f ||<v>i ||
To Do
There are bugs, some significant, and no error checking to speak of. The converted version is seldom ready to post directly, and requires stepping through the whole document, often with a printed copy of the Word document, to fix the differences. On the other hand, if the document is important enough to go on your Wiki, then you probably planned to read through it once anyway.
Contact
Feel free to contact me (JohnWhitlock) with comments and questions, but I don't have much time to work with any problems you might be having. However, feel free to change and play with the code, if you know enough Visual Basic to improve it.
History
10/17/08
Updated macro to write <<BR>> tags, not [[BR]].
6/28/07 Thu
Softintheheadware completes version 2.0, with support for multi-level/alpha/roman lists and easier-on-the-eye wiki ML for tables.
Converts tables to Wiki ML that is easier to read & edit
- Comment lines between rows.
- Space padded columns (edit these in INSERT mode using a fixed-width font).
- Extended support for lists
- Multi-level lists
- Alpha lowercase
- Alpha uppercase
- Roman lowercase
- Roman uppercase
06/16/07
This script has gone about as far as it can go without a more formal approach. If I get a week to come back to it (and it makes sense for my job), then I'll start over with a ProgrammerTest model, with generated Word documents as test cases.
A potential improvement might be to generalize it, so that it could target several Wiki engines. That way, the net could be cast as far as possible for developers. However, at that point we're talking about a SourceForge project, something I don't have the time for at the present.
See related
(crosslinked here for convenience)
Word2WikiPlus - a Word to Moin converter using a standalone program. Works with MS Office 2002-2010.
StandaloneMicrosoftWordConverter - a Word to Moin converter based on Visual Basic script.