This is not a discussion; it is an article. If you want to change the article, please do so, but keep it a flowing, structured article. If you just want to discuss it or say your opinion, add your comments to the last section, Discussion. But please don't enter comments inside the article.

Introduction

In this text, the term wiki refers to any wiki web, not to the software used to implement such a web; the software is referred to as wiki engine.

What is a wiki, and why would we want it to be multilingual? A given mailing list normally works in a given language. Why would a wiki be different?

The answer seems to be that as a community grows, it spawns sub-communities that may work in different languages. In such cases, new mailing lists are created, for example. But for a wiki, things are more complicated, because unlike mailing list messages, which are transient, wikis contain permanent articles, and there may be a strong correspondence between articles in different languages. Thus, when a wiki community spawns a sub-community, it is normally insufficient to merely create a new, independent wiki. Instead, a certain amount of interlinking must happen, and it must be supported by the wiki engine.

However, if multilinguality is the result of the creation of sub-communities, it is important to recognize that these sub-communities shall generally work independently. Attempting to impose rules about blurring language barriers, forcing translations to exist, and ensuring a 1-1 correspondence between articles in different languages, may be counter-productive. Multilingual users are the glue that joins the sub-communities together in one large loose community, and if it weren't for them, we wouldn't have anything to discuss now. Thus, we must care about the needs of multilingual users, and make sure they can navigate and easily jump from an article to its alternative language version, but also recognize that a large part of the work occurs independently in the language sub-communities.

Principles

Alternative versions don't contain exactly the same content

If page B is the French version of the English page A, do both pages contain different language versions of exactly the same content? This is only true if one of them is a translation of the other. But it may be that it is a translation of an old version of the other; or that A and B have been written independently; for example, if the author of B decided to rewrite from scratch and only consulted A; or if he didn't know that A existed before writing B.

It seems that pages will have the same content only if there exists a formal policy that dictates it to be so. For example, a company can have a multilingual web site, and a policy that a page be translated in all designated languages before being made public. However, such policies don't scale, and they cause publishing delays which are considered more harmful than multilingual non-parity. As a result, most companies seem to abandon such strict policies and only try, but not too hard, to make content available in all languages of their web site.

Thus, identical content is the norm, at least in theory, only in such cases as multilingual law, such as Swiss law or EU law. Laws are decisions with a timestamp, that are finally approved and enforced only after they have been translated in all designated languages. This need for accurate translation is part of the formality that results in law making being extremely slow and hardly comparable to a wiki. Incidentally, despite this slowness, quality of translation is a major problem in EU law making.

Bijection, commutativity and transitivity

If page F is the French version of the English page E, is E the English version of F? If F is the French of E, and G is the German of F, is G the German of E? Is it possible for two different French pages to map to the same English page? Does one page have only one alternative version in a given language? In other words, are language mappings bijective, commutative, and transitive?

At first sight these properties seem to hold; but there are exceptions. A hypothetical example is that in Eskimo there may be many different words for the English word snow, each one having a slight difference in meaning, which might be important for Eskimo culture. It can be argued that if there were many different Eskimo snow articles, there could be a separate English version for each one, whose title or WikiName could be a periphrasis. However, what is important here is that it is not anymore obvious that bijection still holds. Another example is when English users decide that page A (whose French version is B) is too long, and break it in two different articles, A1 and A2.

A real demonstration of the problem is the list of Star Trek races in Wikipedia. The list contains short comments on each race, but for a number of races it only refers the reader to a separate article about the race, such as Borg or Vulcan. The German Wikipedians, however, decided that a single article is better; they have no separate article for Borgs or Vulcans, every race being treated in the single Völker im Star-Trek-Universum article. The English list of races and the German Völker link to each other, but what about the English articles Borg and Vulcan? The community is divided as to whether they should link to Völker or not link at all, and the winning opinion is to link. Thus, the English Borg links to the Borg section in Völker im Star-Trek-Universum; no reciprocal link exists, of course.

Another example is the English article on Degree (angle), which links to the French Degré, which links to the English Degree (disambiguation). The French Degré (homonymie) also links to Degree (disambiguation), which, suprisingly, does not link to homonymie but to Degré. Although it looks like an error, a certain discussion shows that there are some disagreements.

It is possible that if the knowledge collected in a wiki were finite and were perfectly organised, then language mappings would be commutative, transitive, and bijective, which let's call altogether perfect. This means that, as a wiki evolves, language mappings tend to become perfect (and pages in different languages tend to the same content). During the process, however, they are not. It is impossible that English page A and French page B are both broken up in A1 and A2, and B1 and B2, at the same time; one of them will necessarily be processed some time before the other; and articles will occasionally be messed up, as seems to be the case with Degré above. Since a wiki does not have a terminal state and is continuously in the process of evolution, language mappings will not be perfect. Not that it matters much, but in addition, cases similar to the above mentioned Star Trek could exist even in the ideal terminal state, because of cultural differences.

Nevertheless, it is important to note that the vast majority of cases, and by vast majority we mean something like 99% or more, are commutative, transitive, and bijective. It might, therefore, be preferable to impose it, on the grounds that it is conceptually simpler. In fact, it appears that both Völker im Star-Trek_Universum and Degré have caused discussion and disagreement, and it might be that everyone would be better off if users had had no alternative option but the simplest and most intuitive.

There is an additional problem. With nonperfect interlanguage links like Wikipedia's, each language community is free to do what they want with the interlanguage links from their language to other languages. Perfect links, instead, will force a certain degree of intercommunity co-operation: if, in an ambiguous case, a French user decides that page E1 and not E2 is the English equivalent of the French F, he imposes this opinion to English users as well. However, whenever such disagreements exist, all disagreeing users will be multilingual, which means that they should be able to work out a solution together.

Different name spaces

Whereas it is clear that "Battle of Normandy" is English and "Bataille de Normandie" is French, many articles have the same title in many languages; examples include "Ella Fitzgerald", "Smalltalk", and generally titles that are names. As a result, different languages must be assigned different name spaces. Name space here is meant in a general sense, not in the sense of a MediaWiki namespace.

One way of implementing language name spaces is with MoinMoin categories, where all English pages would begin with "En/". An alternative that has been proposed and used, to put the language in the name of the page, such as BattleOfNormandyEn, results in ugly names littered with meta-information. Another alternative, used by MediaWiki, is to use one wiki per language.

CamelCase

Historically, wikis have been using CamelCase for hyperlinks. This practice, however, causes several problems:

  • CamelCased terms are recognized by search engines as single words, thus ranking pages incorrectly.
  • CamelCase reduces link readability.
  • In several languages, such as Japanese, Chinese, Hebrew, and Arabic, CamelCase is not possible.
  • Valid CamelCase words have to be escaped in the wiki source.

People with a background in programming languages may find the use of CamelCase natural, and they might even prefer it for the same reason for which CamelCase is often used in naming conventions when programming: it indicates that the entity belongs in a different class. In wikis, however, this is much less important than in programming, as web browsers render hyperlinks in different color. As wikis become more available to nontechnical users, CamelCase becomes less appealing. Sometimes organisations use wikis as their formal web sites, where unregistered users only have permission to view, whereas logged on users can modify; in such wikis, CamelCase is clearly undesirable.

If CamelCase is harder for readers, it has some advantages in writing:

  • Wikisource can be more readable and closer to the processed result with CamelCase than with markup.
  • CamelCase can be faster to type than markup.

The second advantage, is, however, disputed. It generally seems that the disadvantages of CamelCase outweigh the advantages, and there is a tendency to not use it any more. MediaWiki has dropped CamelCase support altogether.

In multilingual wikis, there isn't much to dispute about CamelCase. If the wiki is really expected to only be used in certain languages in which there is a distinction between lower and upper case letters, CamelCase can be used. In other cases, it should better be discouraged for the benefit of uniformity of hyperlinking habits in different languages.

See also: http://c2.com/cgi/wiki?WikiWordsConsideredHarmful

Manual language selection

It is occasionally proposed that the wiki server looks at the Accept-Language http header, or at user preferences stored in cookies, and automatically serve the preferred language version of the requested page. Such automation is unwelcome. First, users expect a given URL to point to a page with given content; cookies may affect the skin, or other details, but not the main content of the page. For this reason, the Accept-Language header is only ever used in order to redirect the top-level page of a site, such as http://www.foobar.com/, to the top-level page in a specific language, such as http://www.foobar.com/en/.

Second, users will be confused by such automation. Multilingual users cannot be expected to change their user preferences each time they want to view a page in another language. Even if a means of manually selecting a language is provided, but wiki links are generic, then multilingual users will be frustrated whenever, by some error on their part, on the wiki engine's part, or on the page author's part, the wrong language version of a requested page pops up.

There must thus be no such automation; the language must be clear from the URL, and it must be clear which language version of a page each link points to.

Multilingual indexes

Single language users want RecentChanges to show the changes only in their language. Multilingual users will either want to view RecentChanges in a given language each time, or to view the combined changes of a given set of languages. Besides RecentChanges, there may be other indexes, such as a site map, or a text search. If it is not possible to provide an option, or until such functionality is developed, providing single-language indexes seems preferable to providing indexes of all languages combined, given the general independence of languages sub-communities.

Existing implementations

MediaWiki

The most prominent multilingual wiki on the web is Wikipedia, whose wiki engine is MediaWiki. Multilinguality is achieved by assigning one wiki per language. The English Wikipedia is almost independent of the French Wikipedia, them being only connected by manually specified links to alternative languages. The article on the Battle of Normandy contains the markup [[fr:Bataille de Normandie]] in the wikisource. This is not rendered; it is only processed by the skin, which adds a link to the French version of that page, Bataille de Normandie. The French version, accordingly, contains [[en:Battle of Normandy]].

This system has many advantages. It is simple to develop, easy for users to understand, free of the questionable assumptions of bijection, transitivity and commutativity, and provides manual selection of language through clearly defined namespaces.

MediaWiki's main disadvantage is that it can be very tedious and error-prone to manually manage Wikipedia's multilingual links. If an alternative language is added, not only links to all existing language versions have to be included in the new version, but links to the new version have to be added in all existing versions. In articles that exist in more than 50 languages, such as Water or the article on Wikipedia itself, this can be extremely hard. As a workaround, Wikipedia has bots, like the German ZwoBot, that periodically visit all articles and fix the links. In fact, there is a separate bot for each language; ZwoBot, for example, only fixes German pages: it takes a German page, follows (recursively) all interlanguage links from it, and then fixes only the links of the initial German page (it normally makes the unambiguous corrections and notify the operator about the ambiguous ones). Obviously this causes much more traffic than if one bot fixed the pages of all languages at the same time, but it appears that a more conservative approach has been taken, of letting each language community decide independently how it wants its bot to operate on the links of the articles of that language.

Some more disadvantages of MediaWiki are the consequence of the independence of the wikis for different languages. The most inconvenient for multilingual users is that different user accounts are required for different languages; another problem is that the validity of the multilingual links is not checked, as they are actually external links, created by the information found in the configuration.

Other implementations

There are at least two other Wiki engines offering support for multi-lingual content and its synchronization

  1. Tikiwiki -> http://wiki-translation.com/CLWE+Demo+Screencast
  2. oddmuse -> http://socialsynergyweb.com/cgi-bin/wiki2/Multilingual/FrontPage

Recommendations

Either one wiki for all languages can be used, or a wiki farm (a set of wikis operated by the same wiki engine installation). With a farm, it is more difficult to provide one account and combined language indexes. With a single wiki, it is more difficult to provide separate language indexes. Since separate indexes are a priority over combined indexes, starting implementing using a farm seems preferable and easier. Having to configure many wikis is more an advantage than a disadvantage, as the subcommunities may want different logos, different default skins, and generally different settings. In addition, having many wikis served by the same account server, and creating meta-indexes to show combined recent changes or search all wikis, seem to be cleaner development solutions than trying to hack functionality into one wiki.

The main problem, then, is to decide whether to force bijection, commutativity and transitivity. The problem has been analyzed in detail above, and it is hard to make a choice. Not forcing these properties results in easier implementation like MediaWiki's, which however causes problems which may need bots to fix. Forcing them is harder, as it seems to require a subsystem independent of all languages to keep the language correspondence of the articles, but may be cleaner in the end.

I think I'd go for bijection, commutativity and transitivity.

See also

MultilingualWiki in Meatball wiki is a collection of ideas from which this document has wildly and shamelessly stolen.

The discussion on Wikipedia's interlanguage links presents a number of problems which developers should read before attempting to implement multilinguality in a wiki engine.

Multilingual communication, an article in Wikimedia Metawiki, is an idea with which we obviously disagree, because it violates all assumptions presented above, but is linked to from here for completeness.

The discussion on ZwoBot, which were done during revision of this article, contains some more links.

Meta

Thanks to MoinMoin developers ThomasWaldmann, AlexanderSchremmer and HeatherStern for discussing the subject with me, and to German Wikipedian "Head" for answering my questions on Zwobot.

Copyright (C) 2005 Antonios Christofides

Permission is granted to copy, distribute and/or modify this document under the terms of either:

  • the Creative Commons Attribution-ShareAlike License 2.5; or
  • the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Discussion

I consider that mediawiki style as an easy, but in fact TOO lightweight implementation. Using a bot digging through all pages fixing language links is ugly. A bot makes lots of traffic for a problem that should be solved better. A wiki farm is better than one wiki for all languages (own namespace, own RC). The account problem can be solved in other ways, we already have some page about that. --p54A1FCD7

One wiki means you will get tons of non relevant search results. Its better to have multiple wikis. User account should be per site - not per wiki. Synchronization of pages in different languages can be done when a page is saved. -- NirSoffer

I have implemented a functionality that enables an Other languages box in the rightsidebar theme. The way it currently works is that the system is not bijection, commutativity or transitivity. That is, it doesn't reverse lookup the page name in language dictionnaries. -- TheAnarcat

The discussion seems to point into this direction: Separating languages by using different sites is too far while not separating them is too close. Some layer inbetween seems to be necessary: Something which allows the wiki engine to cut the different language spaces (think of layers) apart but which also keeps the stacks of pages (one stack per object, each page one language) together.

I'd suggest a solution which is based on a "default language". For each object, there must be a page in the default language (configurable; for this example, let's say this is in English). This page contains the processing instruction lang en. By configuring the default to en, this page automatically becomes the default page for this object. Let's take the page about Water:

##lang en
##lang de Wasser
##lang fr Eau

Water is absolutely necessary. It's often displayed in ["Fountain"]s.

Below the lang are the names of the pages for this object in the other languages. This way, the wiki knows which pages belong to the same stack (to allow the "Other languages" button). When users want to link to an object, they should use the name of the default page instead of the "real" name. So the Water page in German would look like this:

##lang de

Wasser ist lebenswichtig. Es wird häufig in ["Fountain"] präsentiert.

During page generation, the engine should consider the stacks and replace the links with the correct ones if a suitable page exists. For example, we have the page "Wasser" in German which would contain a link to "Fountain" (en) which contains the reference to "Brunnen" (de) in the header. When this page is rendered, the markup ["Fountain"] will be replaced with a link to Brunnen (with the title "Brunnen"). In the rendered version of the page, there is no hint to the default page anymore. When you translate a page, you will know which link to use because you can simply copy the text verbatim from the default language page.

If there is no page for Fountain in the desired language German, a copy of the default page for Fountain should be created. In this case, the wiki engine should try to replace all links in that page with the ones to the German content but with the English titles (so the titles match the text but when the user clicks on them, he'll be back in his preferred language).

On the Wasser page, the link to Fountain should read "Fountain (nur in Englisch)" in this case, that is English link title and German error message. The text after the link is supplied by the config.

This way, you can have untranslated "holes" in your wiki without disrupting navigation completely. -- Aaron Digulla

On the FSFE Wiki the way translated pages are handled is to either find a built-in translation (such as for FrontPage) or to look for pages whose common names are suffixed with a language code. This permits a list of languages to be built and offers the user the opportunity to locate a suitable translation of a page. See the guide to translated pages for more information and the advocacy FAQ for an example, noting the languages in the left menu. -- PaulBoddie <<DateTime(2010-03-21T02:58:21+0200)>>

It is possible to use a farm, with SisterSites , to achieve a multilingual wiki. The pages must have the same name in all languages, which is great for consistency, but not appropriate for all wikis, especially not encyclopedias ("#redirect" pages are still possible, but probably a pain to maintain). See HowTo/MultilingualWiki(sistersites) -- FranklinPiat <<Date(2010-03-21T13:48:48+0100)>>

MoinMoin: Creating multilingual wikis and wiki engines (last edited 2012-02-23 13:12:36 by ThomasWaldmann)