Description
Note: this was fixed in moin-2.0, see: https://bitbucket.org/thomaswaldmann/moin-2.0/issue/41/non-ascii-download-filenames-dont-work
Attachments with non ASCII names are saved with wrong name when downloaded.
Browsers support:
Browser |
Correct Name |
Comments |
IE 6.0 |
No |
|
Firefox |
Yes |
|
Firefox3 |
Yes |
|
IE 7.0 |
No |
When opening a word docuemnt inside the browser, the file name is displayed using url encoding (%xy) in the tab, but the window show the correct name. |
Safari |
No |
|
Opera |
Yes |
|
Steps to reproduce
- upload any file with non ASCII filename
- download file
Example
Examples attachments:
Downloading in IE7:
Downloading in Safari:
Downloading in Firefox:
Component selection
Details
This Wiki.
Workaround
Use Firefox.
And/or use ASCII filenames.
Discussion
MoinMoin sends invalid Content-Disposition header with non ASCII characters:
Content-Disposition: inline; filename="test עברית.txt"
Firefox somehow decode the filename as utf-8 (maybe its a default). Safari and IE try to decode the filename differently, which lead to wrong name, but it is correct behavior. The standard does not allow non ASCII characters in header parameters.
RFC 2231 describe how to use non ASCII characters. It should look like this:
Content-Disposition: inline; filename*=utf-8'en'test%20%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt
email.Utils.encode_rfc2231 can be used to create correct non ASCII headers:
>>> from email.Utils import encode_rfc2231 >>> encode_rfc2231('עברית', 'utf-8', 'en') "utf-8'en'%D7%A2%D7%91%D7%A8%D7%99%D7%AA"
Browsers support:
Browser |
Support RFC2231 |
Comments |
IE 6.0 |
No |
|
Firefox |
Yes |
Both incorrect and correct test cases works |
IE 7.0 |
No |
Tested by AlexanderAgibalov |
Safari |
No |
Same for other WebKit based browsers(OmniWeb, Shira). See WebKit bug 15287 Both incorrect and correct test cases do not work |
Opera |
No |
|
Opera 9.5 |
Yes |
Tested by julian.reschke@gmx.de (as far as I recall, this also worked in earlier releases) |
Note: The test case posted here before was not correct. Please try again with the new test case, posted on 2008-08-16.
Here is a simple CGI script I used to test this.
There is no solution that works on all browsers using valid or invalid content-disposition at this time.
As a short term solution, we can move the filename into the path:
http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt?acion=AttachFile&do=get
The action should assume that the last url component is the filename, and the one before is the page of the attachment.
A long term solution will be to treat attachments as pages, so each attachment is accessible as a sub page of its parent page. For example, the attachment "עברית.txt" will be accessible as:
http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt
Alternative long term solution is to change the url format to:
http://example.com/files/page/filename
files action will expect "filename" to be an attachment of page.
http://example.com/files can give a list of all attachments. http://example.com/files/page a list of all attachments on a page. This can also work for other actions.
I wonder if anyone has addressed this problem ever since or has it been fixed in v.1.6? I'm getting more and more IE users in my Wiki, so the issue becomes more and more irritating
- What solution that works for everyone do you suggest?
- Well, the second solution offered by Nir (if I'm not confusing) - about referencing attachments as sub-pages looks most logical to me. But certainly some experiments need to be carried out. I was thinking to try it myself (although I'm far not a professional developer), but unfortunately don't have any time at all at the moment. So just thought that someone else could have tried...
We will have "attachments as subpages" (or rather: items with sub-items) in a future version of moin. But don't hold your breath, that's still a long way to go... -- ThomasWaldmann 2008-02-14 17:58:46
- Well, the second solution offered by Nir (if I'm not confusing) - about referencing attachments as sub-pages looks most logical to me. But certainly some experiments need to be carried out. I was thinking to try it myself (although I'm far not a professional developer), but unfortunately don't have any time at all at the moment. So just thought that someone else could have tried...
OK, now we need to know for which browsers this "sub-item" method works:
Please help testing
Just try to go there and do a "save as" in the browser - does it give the correct filename?
Link: http://example.com/pagename/%D7%A2%D7%91%D7%A8%D7%99%D7%AA.txt
Expected filename: test עברית.txt (don't be confused if it is rendered right-to-left)
Browser |
works with getting filename out of the sub-item name in the path |
FF2 |
yes |
FF3 |
yes |
IE6 |
No |
IE7 |
|
IE8 |
yes |
Opera |
yes |
Konqueror (3.5.5) |
yes |
Safari |
yes |
links |
no |
lynx |
yes 'p'-key |
Plan
TODO: test with firefox 3 beta
- Priority:
- Assigned to:
- Status:
Patch on this
I also meet with this problem when using attachment with Chinese filename. It is obviously if attachments direct serving mode is turned off in Moin 1.6.
(I am running a patched MoinMoin 1.8 with attachment direct serving at http://www.ossxp.com, so not noticed it, until my client complains.)
I analysed the packages between virous browsers and the web server, finaly I notice the response package from the web server which contains a incorrect 'Content-Disposition:' cause the trouble.
Below is my patch.
1 Download file may corrupt if filename not encode correctly in Content-Disposition header for some web browser;
2
3 diff -r 6278b366fb32 MoinMoin/Page.py
4 --- a/MoinMoin/Page.py Wed Nov 19 10:25:26 2008 +0800
5 +++ b/MoinMoin/Page.py Wed Nov 19 10:25:28 2008 +0800
6 @@ -1047,6 +1047,7 @@
7 # TODO: fix the encoding here, plain 8 bit is not allowed according to the RFCs
8 # There is no solution that is compatible to IE except stripping non-ascii chars
9 filename_enc = "%s.txt" % self.page_name.encode(config.charset)
10 + filename_enc = wikiutil.content_disposition_encode(filename_enc, request)
11 request.setHttpHeader('Content-Disposition: %s; filename="%s"' % (
12 content_disposition, filename_enc))
13 else:
14 diff -r 6278b366fb32 MoinMoin/action/AttachFile.py
15 --- a/MoinMoin/action/AttachFile.py Wed Nov 19 10:25:26 2008 +0800
16 +++ b/MoinMoin/action/AttachFile.py Wed Nov 19 10:25:28 2008 +0800
17 @@ -872,7 +872,7 @@
18 'Content-Type: %s' % content_type,
19 'Last-Modified: %s' % timestamp,
20 'Content-Length: %d' % os.path.getsize(fpath),
21 - 'Content-Disposition: %s; filename="%s"' % (content_dispo, filename_enc),
22 + 'Content-Disposition: %s; filename="%s"' % (content_dispo, wikiutil.content_disposition_encode(filename_enc, request)),
23 ])
24
25 # send data
26 diff -r 6278b366fb32 MoinMoin/action/backup.py
27 --- a/MoinMoin/action/backup.py Wed Nov 19 10:25:26 2008 +0800
28 +++ b/MoinMoin/action/backup.py Wed Nov 19 10:25:28 2008 +0800
29 @@ -39,7 +39,7 @@
30 filename = "%s-%s.tar.%s" % (request.cfg.siteid, dateStamp, request.cfg.backup_compression)
31 request.emit_http_headers([
32 'Content-Type: application/octet-stream',
33 - 'Content-Disposition: inline; filename="%s"' % filename, ])
34 + 'Content-Disposition: inline; filename="%s"' % wikiutil.content_disposition_encode(filename, request), ])
35
36 tar = tarfile.open(fileobj=request, mode="w|%s" % request.cfg.backup_compression)
37 # allow GNU tar's longer file/pathnames
38 diff -r 6278b366fb32 MoinMoin/action/cache.py
39 --- a/MoinMoin/action/cache.py Wed Nov 19 10:25:26 2008 +0800
40 +++ b/MoinMoin/action/cache.py Wed Nov 19 10:25:28 2008 +0800
41 @@ -154,7 +154,7 @@
42 # TODO: fix the encoding here, plain 8 bit is not allowed according to the RFCs
43 # There is no solution that is compatible to IE except stripping non-ascii chars
44 filename = filename.encode(config.charset)
45 - headers.append('Content-Disposition: %s; filename="%s"' % (content_disposition, filename))
46 + headers.append('Content-Disposition: %s; filename="%s"' % (content_disposition, wikiutil.content_disposition_encode(filename, request)))
47
48 meta_cache = caching.CacheEntry(request, cache_arena, key+'.meta', cache_scope, do_locking=do_locking, use_pickle=True)
49 meta_cache.update({
50 diff -r 6278b366fb32 MoinMoin/wikiutil.py
51 --- a/MoinMoin/wikiutil.py Wed Nov 19 10:25:26 2008 +0800
52 +++ b/MoinMoin/wikiutil.py Wed Nov 19 10:25:28 2008 +0800
53 @@ -2624,3 +2624,46 @@
54 ( authtype == 'w' and user.may.write(pagename) ) ) ):
55 return "( " + _("Permission denied for macro: %s")% macro_name + " )";
56 return None
57 +
58 +def content_disposition_encode(text,request=None):
59 + """
60 + UTF filename in Content-Disposition:
61 + IE: failed to download
62 + Chrome: wront filename
63 + FF: works. (Firefox,Epiphany,Iceweasel,Iceape,Galeon)
64 + Opera: works.
65 + Safari: wrong filename
66 + URL encode filename in Content-Disposition:
67 + IE: works.
68 + Chrome: works.
69 + FF: wrong filename. (Firefox,Epiphany,Iceweasel,Iceape,Galeon)
70 + Opera: wrong filename
71 + Safari: wrong filename
72 + """
73 + if isinstance(text, unicode):
74 + text = text.encode('utf-8')
75 + do_url_encode = None
76 + if request:
77 + ua = request.http_user_agent
78 + ## browsers shoud url encode: MSIE, Chrome
79 + for browser in ["MSIE",
80 + "Chrome"]:
81 + if browser in ua:
82 + do_url_encode = True
83 + ## should NOT url encode: Firefox, Opera
84 + if do_url_encode is None:
85 + for browser in ["Opera",
86 + "Firefox",
87 + "Epiphany",
88 + "Iceweasel",
89 + "Iceape",
90 + "Galeon",]:
91 + if browser in ua:
92 + do_url_encode = False
93 + # should convert to OS's charset.
94 + if do_url_encode is None and "Safari" in ua:
95 + do_url_encode = False
96 +
97 + if do_url_encode:
98 + text = urllib.quote(text)
99 + return text
100
-- JiangXin 2008-10-16 14:40:30
Hi JiangXin,
thanks for your patch!
Could you please:
- fix the typos
- explain why safari is not handled like the other browsers?
maybe extract the browser lists into 2 variables defined in MoinMoin.config (not sure whether request.cfg can work and is a good idea)?
- rename "text" argument to "filename"
- use "request" as first argument (a convention we usually follow)
Another thing I am thinking about is whether the content-disposition encoding has to depend on the browsers version.
Assuming that there is one correct way to do it, the other way must be wrong and thus be a bug in those browsers. Assuming they fix that bug some day, we have another problem, because the code won't be able to handle that change and then the fixed browsers will fail due to the code in moin.
Also, someone suggested above that the officially correct method is defined in rfc2231. But your code does not use this at all, right?
-- ThomasWaldmann 2008-11-29 11:09:56
Notes for EasyToDo
The task for this EasyToDo is to fix the problem described above in a RFC-compliant and yet compatible way to existing browsers.
Requires: access to Windows, Linux, Mac OS X machines where you can install and test lots of different browsers
Task includes:
reading RFCs and implementing a RFC compliant solution
- lots of testing with lots of browsers, old and new versions of each browser, on different platforms (Windows, Linux, Mac OS X):
- IE
- Firefox
- Opera
- Safari
- Konqueror ...
- Google Chrome ...
- Links, Lynx, w3m
- ...
creating a documentation in form of a table with your test results:
- H: compliance to RFC, workaround1, workaround2, ...
- V: browser name, version, OS
- should be part of the docstring of the code you implement for easier maintenance
if a browser does not support the RFC way, you have to implement a working workaround for exactly that browser in that version
- note that newer versions of that browser might be fixed and RFC compatible
- if you find different RFC compliance in old and new versions of some browser, you must test with intermediate versions also to exactly identify the versions that need non-RFC workarounds
Time estimates:
- 1h RFC research and coding
- 3h coding workarounds / fine tuning
- 20h testing and documenting
Note: maybe task can be split into multiple tasks on different platforms.