Standalone MS Word 2 Moin-Moin Wiki converter
Preface
I started half a year ago with a simple VBA script (similar to MicrosoftWordConverter), but some of my users found this method too scaring and complicated so I migrated it to a more traditional standalone application written on MS VB Express.
The program is not intended to be a universal converter. It processes only the syntax that I need and, what was extremely important for me, it extracts images and puts appropriate links to them. The version will be improved only if there's such a need in my company, but if you're interested in further development, just drop me an e-mail and I'll send you the source code.
How to use
- start application, press Start and select a .doc file
- wait (for large documents it may take a long time)
- copy text to Wiki text editor
- move images:
- in Windows XP and newer: take doc_img.zip file which will be created near your document (if disc is write-protected then it will be placed to x:\Documents and Settings\user\local settings\temp\word2wiki); upload it to Wiki and press UNZIP link
- in older Win OSes: find images in x:\Documents and Settings\user\local settings\temp\word2wiki and upload them either one-by-one or in a zip file as above
Download
version for 1.7 syntax: word2wiki_0.5.zip
version for 1.5 syntax: word2wiki_standalone_eng.zip
Tested with WinXP + MS Office 2003 only.
Note: MS .NET Framework 2 is required. Otherwise you'll get a weird error message when trying to launch the program.
sources: word2wiki_src.zip
What it can do and what cannot
The following elements are processed:
- tables
- lists
- bold and italic text
- headers 1 - 4
- external links and internal anchors
- images
if there's a TOC then it's replaced with reference to TableOfContents macro
Limitations:
- bullets and numbering in table cells are lost
- merged cells are not processed (you'll have to fix such tables manually later)
- nested lists will not have any multi-level numbering (like 3.2.6)
- underlined text becomes plain text
- large documents are processed VERY SLOW
- footnotes, headers and footers are ignored
- if there're headers in tables cells (it's very wrong, but occurs quite often) then you'll see their tags in text (just remove extra = symbols)
if there are | symbols in a table, they are replaced with !
- only headers named Header N or Заголовок N are recognised
known bugs:
- if you make a list in Word and then remove bullets or numbers with backspace (in Word it will look like indented text) then you'll see these bullets in Wiki
sometimes you will have to remove some garbage near the headers (for example extra apostrophes: '''== my header == (when deleting these apostrophes you'll have to find & delete the preceding opening group of apostrophes otherwise the remaining text will become bold); cause of this garbage - empty lines in Word which have a "bold" attribute
bug reports
if i'm converting a document, ~ 60 Pages, with images and so... I'll get an error (error.txt). Maybe it can help you to fix it? for more information just contact me.
What version of MS Word do you use? Seems like the problem is related to Shapes in your document (or how I process them). In order to extract images I need them to be InlineShapes, not just Shapes. Shapes are "floating" and when you select them you see white circles and the green rotation point. InlineShapes are inline with text and when selected, you see the black rectangle around it. Do you have a lot of Shapes? If not really much, you can try convert them to InlineShapes manually by selecting (I maybe a bit wrong with names because I'm using the localized interface): right-click on a shape -> properties (or format) -> position -> the left box ("inline with text" or something like that). Or if your document is not confidential you can send it to me for analysis... or at least a part of it which has some "suspicious" floating images.
bullets in tables lost
would it be possible to create subpages and include them in tables?
- well, if you mean creating subpages for table cells that have bullets then I don't see any easy solution right now (at least I don't see a solution which I have time to implement): the converter cannot create subpages in Wiki remotely and it means that, anyway, you will have to create every subpage manually and copy-paste its contents one-by-one
See Related
(crosslinked here for convenience)
Word2WikiPlus - a Word to Moin converter using a standalone program. Works with Office 2002-2010.
MicrosoftWordConverter - a Word to Moin converter using a Word Macro script.