Description
N.B. v1.5beta4 .. patch-298
Pasting text from Word often produces ConvertErrors or strange markup.
Steps to reproduce
- Copy fairly heavily formatted text from Word into the past from word window. Save the resulting changes.
Details
Discussion
- Strange junk appears just before tables, e.g.
||<^width="79px">Version||<^width="84px">Revision||<^width="96px">Edited By||<^width="309px">Comment||
- What exactly do you mean by "junk"?
Pasting straight into the GUI editor (i.e. autodetect paste from word) produces another crash (traceback3.html).
Pasting into the "paste from word" pop up now doesn't crash. I.e. whatever checkboxes you check or don't it succesfully saves the page.
The text is still pretty mangled however. I'll attach the original document test.doc and the resulting page.mht (sorry about the IE specific format) and wikitext.txt , so that it's easier to see what I mean.
The main issue is the almost-table like wiki markup that gets stuck at the start of tables, like the example I gave before.
It's not junk. It's MoinMoin's markup to style the tables. It's there to make the tables look like those you copied. -- RadomirDopieralski 2005-12-05 11:20:16
I realise it's supposed to be style markup, the problem appears to be that often the first line of a table will renders as a line of text in it's own right rather than as part of the table. I suspect this is because a spurious line end character get stuck in there somewhere. OwenJones 2005-12-05 14:34:00
N.B. I realise that this document is far too complex to realistically expect the GUI editor to handle completely and accurately. Hopefully however you can get it to handle things like this relatively gracefully, which would be wonderful. Given that aim, I'm pretty happy with it now (i.e. with the patches paste from word using the paste from word function generally works and doesn't crash) , but it would be better if the auto-detect paste didn't crash, and very impressive if Moin could be get to the position where produces fairly clean text from almost anything pasted in from word, supporting what formatting it can and stripping what it can't.
With 1.5.0 patch-298 I don't get the convert error, but the table isn't rendered, cut and pasting the 1 by 2 table from CausesConvertError.doc.
I see the bolded wikitext including the pipes that form the table. Hitting Edit(Text) shows (cut and pasted). It looks like maybe there's some extra linefeeds. If, from text mode, I delete the two after Fred and the two after Barney, it renders correctly.
||<tablewidth="98%" tablestyle="border: medium none ; width: 98.78%; border-collapse: collapse;"rowstyle=""^34%style="border: 1pt solid windowtext; padding: 0in 5.4pt; width: 34.14%;">'''Fred ''' ||<^65%style="border-style: solid solid solid none; border-color: windowtext windowtext windowtext -moz-use-text-color; border-width: 1pt 1pt 1pt medium; padding: 0in 5.4pt; width: 65.86%;">'''Barney ''' ||
Another similar problem. Convert Error "process_inline: Don't support place element." I get this when pasting directly into the editor window, but not when cut and pasting into the word clipboard pop-up. If you use a place name, word apparently puts some hidden (smart tag?) stuff in there that gets cut and pasted. Cut and pasting from NewYork.doc directly into the editor window using ctrl-v gives tracebackPlaceElement.html.
- wtf is a "place" element? Is it in HTML standard?
It appears that MS word is automatically applying a "Smart Tag" to New York, because it recognizes it as a place. This is embedded in word's conversion to html. If I cut and paste into the GUI editor and select "source," this is what I see.
<p> </p><p class="MsoNormal"><st1:place w:st="on"><st1:city w:st="on">New York</st1:city>, <st1:state w:st="on">New York</st1:state></st1:place></p> <p> </p>
- Which is valid XHTML and should be easily be handled by filtering out tags of unknown namespaces ...
Plan
- Priority:
Assigned to: ThomasWaldmann
- Status: partly fixed in:
- moin 1.5 patch-261 (just ignore font elements)
- moin 1.5 patch-279
- moin 1.5 patch-298 (handle thead and tfoot)