Description
Moin (like Mediawiki) uses _ to replace a space char to make URLs nicer (so it is not Thomas%20Waldmann but Thomas_Waldmann).
Of course this is a problem if you really need an underscore somewhere, e.g. if you want to link to files on disk (I played with some "filesystem virtual page" plugin for doing that). The underscore must not be replaced by a blank by unquoting functions in that case.
So the first idea is of course to use %5f to encode. But that doesn't work because we first do url_unquote (makes an underscore char out of it) and then we replace all underscores by blanks... 8(
Usually this should be easily fixed by just doing first the replace("_", "%20"), then do the unquoting and then NOT do a replace("_", " "). But the unquote stuff in moin is a (partly even platform dependant) mess...
Steps to reproduce
- compare resulting page name from those:
http:/MoinMoinBugs/UnderScoreQuotingProblem/Test_Page - should be and is "Test Page"
http:/MoinMoinBugs/UnderScoreQuotingProblem/Test%5fPage - should be "Test_Page", but is "Test Page"
Details
This Wiki.
Workaround
None.
One may use one of unicode characters similar to underscore (FF3F _). However the current version (1.5.2) doesn't handle the conversion well in some cases. But for basic uses this might suffice. The other d Disadvantage is that this character is not present in all fonts, so some visitors may see a square instead and it makes manual entering of url complicated. -- hajma 27.2.2006
Think about the poor user trying to link to the page foo_bar. He will try "foo bar", and "foo_bar", but none will work, and then he will file a bug here.
For those who know about the problem, you can type "page name" instead of page_name as a search term. This seems to work perfectly under Windows/IIS, others? This is not a real workaround, though, because there will always be someone who does not get the memo, and it's counterintuitive.
Discussion
Shall we do a refactor branch for the quote / unquote mess?
- Another related thing to consider refactoring is the escape mess. The solution is simple: implement an URL class that gets anywhere and is compatible to strings.
This should be solved before we unify pages and attachments.
The problem is trying to be smart and making _ == ' '. This does create nicer urls, but make problems for files, or with names like __foo__, which should be legal names, for example in a wiki about Python. The simplest and robust solution is remove the extra magic, don't replace _ with ' '. If a user want a nice url, he will use this_name, if he want spaces, he will use this name, or he may use this-name or ThisName - who cares? let theme use what they want. The page name is the url, the page title can be anything the user like to use.
Since changing this might break existing pages and links, lets make it a configuration option, and disable the replacement by default in the next version, and check the user response.
I have a big wiki with lots of pages with spaces. Having SearchAndReplace across the wiki might help people like me to migrate to the new system.
Some browsers (Safari) automatically unquote urls, so any urls are always nice url e.g. this%20name displayed as this name. -- NirSoffer 2006-02-11 21:36:40
I know this is marked as "fixed" in 1.6 branch (since over a year ago), but we're still suffering with the problem in 1.5.x. Any chance there's a hack or a work-around we can apply until 1.6 is implemented? In the mean time I'll see if I can create my own fix, but I'm still pretty slow with python/moinmoin. Thanks. -- SteveDavison 2007-08-10 02:17:01
This stuff (although it looks like a minor thing) has quite some consequences and, in total, is a bigger change. 1.5.x won't get bigger changes any more. The good news is that 1.6 converter has made quite some progress recently (the reason a converter is needed is because of this change and all of its small, but potentially widespread consequences). -- ThomasWaldmann 2007-08-10 06:44:19
I was thinking more along the lines of modifying the FullSearch action to recognize search elements of the form xxx_xxx_xxx, and converting them to "xxx xxx xxx", basically automating the known work-around. Do you think this is plausible, or would I be wasting my time? Thanks! -- Steve
Personally, I think that turning spaces into underscores is just fine. Maybe it's not for everyone, but I'd prefer to have it as an option. The thing is... if you are going to equat "_" with " " in one case, you have to do it in every case. Searching for "some_word" and "some word" should be treated as exactly the same thing. -- SteveDavison 2007-08-11 05:03:28
The underscore magic is already removed in 1.6 (did more harm than good). For stuff containing blanks, we need quoting. BTW, google also does it with quoting. -- ThomasWaldmann 2007-08-11 11:04:18
I understand about "phrase searhing" requiring quotes, because otherwise the unquoted words would be treated as separate search terms, not adjacent words. My only concern is that pages_with_underscores should be found if you search for pages_with_underscares, the exact page name. That's what's messing us up. (Wondering, if removal of _ magic had a negative benefit, shouldn't it be reworked somehow?) I'll try my workaround and post the code if it works. Is 1.6 release imminent? Thanks -- SteveDavison 2007-08-11 11:35:39
After removing the underscore magic, it will of course not confuse them with blanks any more. Looking and my current big linking related parser cleanups, 1.6 will need quite some testing. -- ThomasWaldmann 2007-08-11 22:16:28
Plan
- Priority:
- Assigned to:
- Status: _ magic removed in 1.6 branch