Not sure if that matters, but OS is Windows 11, 24H2
This is freshly started admin powershell window, "Current language for non-unicode programs" is set to "English (United Kingdom)", UTF-8 is set to off.
So, following results are not surprising:
(text description: the screenshot shows ACP and OEMCP registry values, showing respectively code pages 1252 and 850, output of [Console]::OutputEncoding.CodePage showing 850 and finally output of chcp command showing 850 as well)
When going to window properties, the following shows up:
(text description: the properties window shows "Current code page" set to 65001)
I know this means UTF-8, but where does PowerShell command window take this value from?
(To make it even more interesting properties for admin's CMD show 850)
Additional screenshot showing:
HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor
1 Answer 1
It took a bit of diving in, and more than few hours, but I've found the answer. There is a chance this question should be moved to stackoverflow instead (or some other stackexchange)...
So, I suspected the property dialog is rather coming from conhost rather than powershell itself. I've stumbled upon post titled Understanding Windows Console Host Settings, it hinted, that I should rather look at console.dll file.
I have thrown in console.dll into Ida^W ghidra and found ConsolePropertySheet there (note, this isn't really helpful path, but I'll describe it anyway)
Inside ConsolePropertySheet() there's a call to GetRegistryValues(). I did a bit of googling, and Microsoft terminal contains OpenConsole code (which is basically open source version of conhost), inside src/propsheet/registry.cpp, there's this piece:
Status = RegistrySerialization::s_QueryValue(hTitleKey,
CONSOLE_REGISTRY_CODEPAGE,
sizeof(dwValue),
REG_DWORD,
(PBYTE)&dwValue,
nullptr);
if (SUCCEEDED_NTSTATUS(Status))
{
if (IsValidCodePage(dwValue))
{
pStateInfo->CodePage = (UINT)dwValue;
}
}
The hTitleKey points either to HKCU\Console or one of the subkeys under HKCU\Console (i.e. HKCU\Console\%SystemRoot%_System32_WindowsPowerShell_v1.0_powershell.exe).
If you're curious, before it's shown in property sheet, there's a call to LanguageDisplay() which gets some nice-looking string.
Anyway, That part looks pretty straightforward, so I started thinking, there must be some other place where CodePage is set. And indeed there are ApiRoutines::GetConsoleOutputCodePageImpl and ApiRoutines::SetConsoleOutputCodePageImpl, which are called when SetConsoleOutputCP() or GetConsoleOutputCP() winapi functions are called.
Now, there's slight, but important difference in those functions, so I need to pastre content here:
void ApiRoutines::GetConsoleOutputCodePageImpl(ULONG& codepage) noexcept
{
try
{
LockConsole();
auto Unlock = wil::scope_exit([&] { UnlockConsole(); });
const auto& gci = ServiceLocator::LocateGlobals().getConsoleInformation();
codepage = gci.OutputCP;
}
CATCH_LOG();
}
...
[[nodiscard]] HRESULT ApiRoutines::SetConsoleOutputCodePageImpl(const ULONG codepage) noexcept
{
try
{
auto& gci = ServiceLocator::LocateGlobals().getConsoleInformation();
LockConsole();
auto Unlock = wil::scope_exit([&] { UnlockConsole(); });
RETURN_IF_FAILED(DoSrvSetConsoleOutputCodePage(codepage));
// Setting the code page via the API also updates the default value.
// This is how the initial code page is set to UTF-8 in a WSL shell.
gci.DefaultOutputCP = codepage;
return S_OK;
}
CATCH_RETURN();
}
Notice the comment.
At this point I was pretty much positive something is changing the codepage. I've thrown powershell into windbg and set breakpoint on kernel32!SetConsoleOutputCP.
It gets triggered right after initial banner, and it is getting invoked with value 0xFDE9, which is 65001. The callstack looks like so:
[0x0] KERNEL32!SetConsoleOutputCP 0xcbeff0e328 0x7ffe9c8466ce
[0x1] mscorlib_ni+0xc566ce 0xcbeff0e330 0x7ffe9c9597fa
[0x2] mscorlib_ni!System.Console.set_OutputEncoding+0x12a 0xcbeff0e3e0 0x7ffe3dcc4226
[0x3] Microsoft_PowerShell_PSReadLine_268e80b0000!Microsoft.PowerShell.Internal.VirtualTerminal.set_OutputEncoding+0x16 0xcbeff0e440 0x7ffe3dcc8ad6
[0x4] Microsoft_PowerShell_PSReadLine_268e80b0000!Microsoft.PowerShell.PSConsoleReadLine.Initialize+0x246 0xcbeff0e480 0x7ffe3dcc9054
[0x5] Microsoft_PowerShell_PSReadLine_268e80b0000!Microsoft.PowerShell.PSConsoleReadLine.ReadLine+0xd4 0xcbeff0e4c0 0x7ffe3dd81887
...
Luckily, microsoft has published PSReadline along with powershell, and as can be figured out from the name itself, it's basically readline library for powershell. Looking at private void Initialize(Runspace runspace, EngineIntrinsics engineIntrinsics), there this piece:
// Don't change the OutputEncoding if already UTF8, no console, or using raster font on Windows
_skipOutputEncodingChange = _initialOutputEncoding == Encoding.UTF8
|| (RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
&& PlatformWindows.IsConsoleInput()
&& PlatformWindows.IsUsingRasterFont());
if (!_skipOutputEncodingChange) {
_console.OutputEncoding = Encoding.UTF8;
}
So, I've tried calling [Console]::OutputEncoding.CodePage, and what surprised me, was that triggered breakpoint AS WELL! TWICE!!!. First time with 0x352 - so 850 second time with 65001.
And this can also be confirmed, as it seems before every command, PSReadline does this:
T CallPossibleExternalApplication<T>(Func<T> func)
{
try
{
_console.OutputEncoding = _initialOutputEncoding;
return RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
? PlatformWindows.CallPossibleExternalApplication(func)
: func();
_initialOutputEncoding is set inside Initialize(), right before it changes encoding to UTF-8.
Fun part: if you happen to show Properties between the two calls, it will show 850 codepage as expected.
Hope you've enjoyed the ride. I hope noone will mind if I accept this answer :P
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
[Console]::OutputEncoding.CodePage. If .net caches console output encoding, I assume[Console]::OutputEncoding.CodePageshould show proper thing.