MicrosoftWordConverter

From $1

Table of contents
  1. 1. To Install
  2. 2. To Use
  3. 3. What Works
  4. 4. What Doesn't Work
  5. 5. To Do

This is a Visual Basic script that converts Microsoft Word 2000 documents to MoinMoin markup. It is based on swythan's WordToWiki macro for TikiWiki (http://tikiwiki.org/tiki-index.php?page=WordToWiki_swythan), but has been heavily modified.

I (JohnWhitlock) created this code because I had a dozen Word documents, with tables, lists, headings, etc., that I wanted to convert to Moin pages for our intranet Wiki. Manual conversion took about 8 hours for a complex, 50 page document. With this script, it took about 2 hours, mostly to fix tables, lists, and extract images. Now, I'm too busy doing his job to make the script any better, but I hope it is useful for someone else in it's current ugly form.

(see attached file below)

To Install

  1. Download the code
  2. Start Word 2000
  3. Start the Visual Basic editor (Tools->Macros->Visual Basic Editor, or Alt-F11)

  4. Import the code (File->Import File), preferably into the Normal template

To Use

  1. Load the document (or, preferably, a copy of the document)
  2. Select Tools->Macro->Macros

  3. Select Word2Moin, then Run

  4. Wait a minute or so. The document will be coverted in place, and copied to the clipboard
  5. Paste the results in the MoinMoin editor window (Ctrl-V) OR

  6. Select File->Save As, and chose "Text Only (*.txt)" for the format

What Works

  • Converts the Word Table Of Contents (field "TOC") into a Moin table with inter-document links
  • Converts Word Headings into Moin Headlines
    • Inserts Anchor() macro and section number, if TOC was found
  • Converts Bold, Italic, Underlined, Superscript, and Subscript to Moin equivalents
  • Converts Lists to Moin lists
  • Converts Tabs to Moin tables
  • Converts Tables to Moin Tables, including background color and justification
  • Replaces page breaks with Moin line rules
  • Separates paragraphs with extra line breaks (Moin paragraph format)
  • Copies the results to the clipboard

What Doesn't Work

  • Multi-level lists are converted to flat lists. The user has to manually correct list depth.
  • Letter lists (a,b,c) are converted to numbered lists. The user has to manually change them to letter lists (1. to a.), in addition to any list depth issues.
  • Sometimes the Word justification doesn't make sense. The user has to fix some cells or rows to make the Moin table look like the Word table.
  • Empty table cells are converted poorly. The user has to manually fix these tables.[NB - use a 'space' with 'typewriter' formatting for an empty cell]
  • There is no attempt to turn Word merged table cells into Moin spanned cells. The user has to do this manually, if desired.
  • Heading numbers - Sometimes, the algorithm misses a section, so the Table of Contents doesn't work for that section. The user has to manually add the number, Anchor() macro.
  • Section names - A Word section break sometimes appears in the Table Of Contents, but no Moin heading is created. The user has to manually add the Moin heading, Acnhor() macro.
  • There is no support for text colors in Moin, which may make some tables look bad, with black text on a dark background.
  • Word uses special characters for dashes, elipses, and left/right quote marks. Many browsers can display these correctly, but it might not pass an HTML validation test. To get these special characters converted, save the script-converted document as a plain text file, and make Word do the work.
  • Pictures, diagrams, etc., are not automatically exported. The best method appears to be to size the Word document so that the picture is the right dimensions, copy it to the clipboard, paste it into Microsoft Paint, and save as a .JPG. Then, manually enter the attachment: markup into the Moin document.

    -- RobertSeeger [[DateTime(2005-01-21T14:43:59Z)]] If you have many images, then it might be easier to save a copy of the document to "Website(HTML)". This will create separated JPG,GIF or PNG files and the awful proprietary XML and HTML files (just throw those away).

To Do

There are bugs, some significant, and no error checking to speak of. The converted version is seldom ready to post directly, and requires stepping through the whole document, often with a printed copy of the Word document, to fix the differences. On the other hand, if the document is important enough to go on your Wiki, then you probably planned to read through it once anyway.

This script has gone about as far as it can go without a more formal approach. If I get a week to come back to it (and it makes sense for my job), then I'll start over with a ProgrammerTest model, with generated Word documents as test cases.

A potential improvement might be to generalize it, so that it could target several Wiki engines. That way, the net could be cast as far as possible for developers. However, at that point we're talking about a SourceForge project, something I don't have the time for at the present.

Feel free to contact me (JohnWhitlock) with comments and questions, but I don't have much time to work with any problems you might be having. However, feel free to change and play with the code, if you know enough Visual Basic to improve it.

Tags:
none
File Size Date Attached by  
 WordToMoin.bas
No description
24.14 kB23:34, 26 Nov 2007wiki-upgradeActions
Images (0)
 
Comments (0)
You must login to post a comment.