Here is what I want to do:
Redirecting properly the output of console applications (I use the term command in the rest of the message) into a file with 1252 encoding (to make it readable from any notepad soft in default configuration.
What I’ve observed:
Chcp is effective with internal commands and some external command (recent ones)
First of all it’s worth noticing CHCP operates differently under Win7 and Win 10.
If the following batch is run from a cmd prompt, you can notice the command outputs are displayed properly in win10 console whereas a win7 console renders characters out of ASCII badly.
for /f "tokens=2 delims=:" %%G in ('chcp') do Set _cp_=%%G
chcp 1252
@echo test an internal command
dir
@echo test an external (recent) command: Robocopy
robocopy .\ .\ /L
@echo test an external (legacy) command: Xcopy
xcopy test.txt 2>&1
chcp %_cp_%
echo end of test.cmd batch
Incidentally, I am interested in knowing what causes such a difference although it’s not really the purpose of that message and since it’s easily fixable by adding a ps invoke "powershell [console]::outputencoding=[system.text.encoding]::getencoding(850)" in the batch after the 1st chcp command.
Whatever the real issue occurs when the batch output is redirected into a file: test.cmd> test.txt.
In that case the result is the same whatever OS. The output of Internal commands and new external commands (Robocopy, Bcdedit, etc) are properly 1252 encoded. Legacy commands (xcopy, chcp, etc) are not (output in OEM code page). In brief, most of commands are not affected by CHCP or equivalent [console] change thru powershell.
Various speculations about that mess:
The legacy command code is based on CRT whereas internal commands and most recent external ones use Win32 API. It’s based on the last section regarding the console application development from MSDN Globalization Step-by-Step!
Since at least win10 what is displayed in the console (same encoding for all command outputs) and stored a file is different (output encoding change depending the command), output/input streams may be handled differently depending on the type of handles they point. Console functions may be used for display and I/O file functions in case of redirection. Speculation based on High-Level Console Input and Output Functions!
MS recommends the code of console applications forces OEM encoding of the output stream. Ref. Console Application Issues If MS suggestion is applied in the code of external commands that may explain why the redirection of their output streams into a file is always encoded OEM_CP whatever the console code page is applied. Oddly, readfile and writefile are not mentioned among functions affected by SetFileApisToOEM
Finally I don’t know if the difference between legacy commands and lately introduced ones is because their code respect MS suggestion and just because string literals are coded OEM vs ANSI..
Possible solutions/workaround
If 3 is correct, they are certainly very few.: It’s possible to change the value of registry key HKLM\system\currentset\control\NLS\codepage OEMCP=1252. It’s not safe (do not try to set Unicode 65001, your system may refuse booting) and inconvenient (reboot necessary). Or, filling the file with OEM encoded contents only and transcoding the file with PS script at the end of the batch. Simple but not very elegant if the file has to be accessed and checked periodically.
If 2 is correct, there may exist a function that controls the encoding of I/O file function readfile and writefile.
If 1 is correct, it should be possible to control the international settings or culture of the current user session and so control the code page of CRT application. Since Win8, it’s possible through Powershell Configure International Settings in Windows. Command line applications are also able to perform such things. Whatever, the difficulty here is about creating a "culture" with OEM code page set to 1252 as that doesn’t exist in the pre-defined set.
Even if there is no effective solution regarding that issue, do not hesitate to share your knowledges of that topic. I am just curious to understand how MS has implemented that stuff.
1 Answer 1
I realized 4 years later :-p I never posted the workaround I use routinely in my scripts to overcome this issue. It’s a short batch which the problematic command (ie almost all the external batch commands) is piped in.
:: Description : Transcode the OEM output stream of external commands to ANSI chars in order to get notepad readable files with redirections
:: Usage: prog.exe | output1252
:: 2>&1 prog.exe | output1252 > log.txt
@echo off
setlocal
:: CodePage of the command and console programs (OEM code page)
set OEM_CP=850
chcp %OEM_CP% >NUL
:: The default local Windows codepage (ANSI code page), for Western Europe: 1252
set ANSI_CP=1252
>NUL chcp %ANSI_CP% & for /F delims^=^ eol^= %%A in ('more') do (
call :WRITEOUT %%A)
echo:
goto :eof
:WRITEOUT
echo %*
goto :eof
- Empty lines are dropped (by FOR /F instruction): an extra line is inserted at the very end of the script to make the log files more readable
- Can’t process binary data due to the ‘more’ instruction features: TAB code transformed to space, etc
- Redirect STDErr to STDout to trancode it
EDIT 28/03/2025: It sounds like those scripts still get some points, so here's an update I used to, although I haven't tested for a long time. The main improvement is the empty lines are not dropped anymore.
@echo off
setlocal
:: CodePage of the command and consol programs (OEM code page)
set OEM_CP=850
chcp %OEM_CP% >NUL
:: The default local Windows codepage (ANSI code page)
set ANSI_CP=1252
>NUL chcp %ANSI_CP% & FOR /F "tokens=1* delims=]" %%A IN ('FIND /N /V ""') DO (
call :WRITEOUT %%B)
goto :eof
:WRITEOUT
echo:%*
goto :eof
Another one that can accept a piped command or redirected file
:: Description : Transcode OEM 850 input data to ANSI-1252 output
:: Usage: ThatBatch <850.txt >1252.txt
:: prog.exe | ThatBatch
:: 2>&1 prog.exe | ThatBatch > log.txt
@echo off
0<NUL chcp 850 >NUL
clip
0<NUL chcp 1252 >NUL
powershell Get-Clipboard
- The data stream is let untouched (it is even possible to use raw option for binary data)
- Slightly slower than the previous one for short contents but probably faster for large chunks of data or big files.
- Require PowerShell 5
Those samples transcode OEM 850 to ANSI-1252 but they can be adapted easily to transcode any codepages (probably Unicode as well). Feel free to improve them (implementing input/output codepages as arguments) and accommodate to your needs ..
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
bashhave to do with this question? Please remove the tag unless you can explain.