Message251271
| Author |
martin.panter |
| Recipients |
Arfrever, martin.panter, orsenthil, serhiy.storchaka, vstinner |
| Date |
2015年09月21日.22:31:22 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1442874682.77.0.0262967864815.issue25184@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Serhiy’s patch essentially uses the local filesystem encoding and then percent encoding, rather than the current behaviour of strict UTF-8 encoding and percent encoding. This is similar to what the "pathlib" make_uri() methods do, so maybe we could let "pathlib" do the work instead.
This draft RFC discusses encoding "file:" URLs:
https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-03#section-4
It suggests leaving Unicode characters alone (in IRIs) if possible, or using UTF-8 and percent encoding even if the filesystem uses a non-UTF-8 encoding. Perhaps we could leave the filename in the HTML as Unicode characters without percent encoding, and only percent encode the undecodable (surrogate-escaped) bytes.
This "IRI" scheme is also recommended by <http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx>, which says on Windows, "in file URIs, percent-encoded octets are interpreted as a byte in the user’s current codepage". This contradicts the draft RFC and the "pathlib" implementation, which both use UTF-8. |
|