Message223238
| Author |
eryksun |
| Recipients |
eryksun, ezio.melotti, jaraco, lemburg, loewis, r.david.murray, serhiy.storchaka, vstinner |
| Date |
2014年07月16日.17:43:07 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1405532588.04.0.277423657798.issue21927@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
> PS C:\Users\jaraco> echo £ | py -3 -c "import sys; print(repr(sys.stdin.buffer.read()))"
> b'?\r\n'
> Curiously, it appears as if powershell is actually receiving
> a question mark from the pipe.
PowerShell calls ReadConsoleW to read the console input buffer, i.e. it reads "£" as a wide character from the command line. The default encoding when writing to the pipe should be ASCII [*]. If that's the case it explains the question mark that Python reads from stdin. It's the default replacement character (WC_DEFAULTCHAR) used by WideCharToMultiByte.
[*] http://blogs.msdn.com/b/powershell/archive/2006/12/11/outputencoding-to-the-rescue.aspx
You can change PowerShell's output encoding to match the console:
$OutputEncoding = [Console]::OutputEncoding
If the console codepage is 65001, the above is equivalent to setting
$OutputEncoding = [System.Text.Encoding]::UTF8
http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8
As Victor mentioned, this setting always writes a BOM, and under codepage 65001 it actually writes 2 BOMs (at least in PowerShell 2). Victor also mentioned that you can avoid the BOM by passing $False to the constructor:
$OutputEncoding = New-Object System.Text.UTF8Encoding($False)
http://msdn.microsoft.com/en-us/library/system.text.utf8encoding
There's still a BOM under codepage 65001, but maybe that's fixed in PowerShell 3.
I avoid setting the console to codepage 65001 anyway. ReadFile/WriteFile incorrectly return the number of characters read/written instead of the number of bytes because the call is actually handled by ReadConsoleA/WriteConsoleA. Maybe that's finally fixed in Windows 8. |
|