In spanish1 there are a few very frequent 1 letter words like "y" (meaning "and") and "o" (meaning "or"). WikiWords with such 1 letter words are not recognized, like in MaterialesYMétodos (MaterialsAndMethods).
The same is true for Russian, which has many one-letter words, for instance: "и" ("and"), "у" ("by", "at"), "о" ("about"), "к" ("to"). -- KonstantinVeretennicov 2005-10-26 13:21:15
See below for a regex that probably works. If noone with a need for this feature tries it out, nothing will happen. -- JürgenHermann 2005-10-26 16:39:16
It works. -- EduardoMercovich 2006-06-28 14:12:17
This little problem make wiki words more difficult to use, and lowers the usefullness of the automagical linking that is an essential feature of any wiki engine.
My request is to add to the parser the ability to recognize these words as WikiWords, maybe adding an optional configuration variable.
A feature like this will boost Moin presence and be surely very welcome in the the spanish writing community.
Current solutions
I am using this patch right now and it seems OK. Of course, more tests are needed, so any spanish speaking Moin could try this and report it's implications.
In parser/wiki.py, line 40 (any Moin dev guy, please express this like you do in diffs, I don't know how to do it):
were it says [%(u)s][%(l)s]+
replace for [%(u)s][%(l)s]{0,}
It seems that this is the only change needed. Please report here any problem that may arise. Thanks a lot to Nir for the pointers and Augusto Rückert for this solution.
Side effect
Serendipity... with this regexp, acronyms are now WikiWords (like AFAIK). At least in my case, that is OK. -- EduardoMercovich 2005-06-02 18:41:46
This is NOT a good solution, unless you WANT any upper cases words to become a link.
As I said, in my case it is. Acronyms are many times referred to institutions so it is good to have a page for each one, and/or things that are not always known. So, that's why I started with serendipity... -- EduardoMercovich 2005-06-03 13:43:18
Yes, but its not usable as a general solution. You can easily use ["ABCD"] for acronyms.
Discussion
I think it make sense. The only problem is more false positives, AWikiWord you did not expect to become a link. But this should be rare, since in language capitalized words are never joined together. This can be a problem to people who write about code but the current parser make their life hard anyway, and they can use the #nocamelcase parser.
The best thing to do will be to have a plugin parsers with lest strict wiki word rules, and let people use it. If we get good feedback, we can use it as the default parser, and keep the old strict parser as plugin. -- NirSoffer
That would work OK, thanks Nir. -- EduardoMercovich
Nir, I am talking with a knowleadgeable python programmer and planning to include this less strict parser as part of a project and release it to the community. Can you point him to the most adecuate references/documentation so he knows were to read to make the requiered changes? He knows a lot of Python but nothing about Moin. Thanks a lot... -- EduardoMercovich
I think the change should be in parser/wiki.py. There are regular expressions in the top of the file, one of them is for wiki word. I hope that different re will be the only needed fix. Maybe you can just subclass the wiki parser and override word_re (or whatever its called).
I'm not sure if a parser would be enough, there are more places when wiki words are checked. To find those places, grep for word_re.
If your programmer needs help, send him to #moin on irc.freenode.net.
IIRC the definition of WikiWords in WikiWiki and UseMod needs at least one lower-case letter in a WikiWord. I know for sure, that WhatIsAWiki is a WikiWord in UseMod I do not know enough regex, but the following should be WikiWords: WOrd WoRD WORd. The following, however, should not: wORd, woRD WORD -- DonRedman
The current solution is a first step. Ideally, if acronyms are not desired wikiwords, we should search for at least one lower case letter. Again, I don't know how to make a regex for this. --EM
In this code:
word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % { 'u': config.chars_upper, 'l': config.chars_lower, 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '', 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '', }
the (?:[%(u)s][%(l)s]+){2,} is what defines a word. Try (?:[%(u)s]+[%(l)s]+)+[%(u)s]* instead and write a unit test with your above examples to check wheter this one fits them.
-- JürgenHermann 2005-08-31 19:50:39
or add also numbers please. like twiki defines a wikiword.
This one failed on capitalized words, that is:
TraditionalWikiWords work OK (i.e. generates a link)
- WikiWordsWithAOneLetterWord also work OK (i.e. generates a link)
ALLCAPS work OK (i.e. doesn't generate a link)
camelWithoutFirstCapital work OK (i.e. doesn't generate a link)
Capitalized DOESN'T WORK (i.e. it does generate a link for every capitalized word)
I'm not a regex wizard at all, but standing on Jürgen's shoulders, it seems that a tiny modification is working. I'm using (?:[%(u)s]+[%(l)s]+){2,}), that is, I simply added the first + after [%(u)s] and left the rest as in the original word definition.
I added a patch (wiki.py-OneLetterWikiWords-1.5.5a.patch) that does the trick for me (as the patch name implies, it is applied to the stock 1.5.5a release... I don't follow the development of MoinMoin, but I don't think there are wild modifications in parser/wiki.py, so this should work on reasonable recent versions -- MarianoAbsatz 2006-09-29 20:34:39
FWIW, the patch wiki.py-OneLetterWikiWords-1.5.5a.patch still works with MoinMoin 1.5.8 (albeit, with a small offset, you'll get a message like Hunk #1 succeeded at 35 with fuzz 1 (offset -2 lines). but it works just fine). -- MarianoAbsatz 2007-06-30 03:40:33
The old patch doesn't work in 1.6... I just uploaded a version for 1.6.2 that should work OK from 1.6.0 on (though I only tried it with 1.6.2): text_moin_wiki.py-OneLetterWikiWords-1.6.2.patch -- MarianoAbsatz 2008-03-28 19:13:34
I don't know if is the same case with other romance (latin based) languages like French. (1)