I would like to deal with filename containing strange characters, like the French é.
Everything is working fine in the shell:
C:\\somedir\\>ren -h
I created the following block, which I put at the beginning of my batch files:
set Filename=%0
IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
rem Converting code page from 1252 to 850.
rem My editors use 1252, my batch uses 850.
rem We create a converted -850.bat file, and then launch it.
set File850=%~n0-850.bat
PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
call %File850%
del %File850%
EXIT /b 0
:CONVERT_CODEPAGE_END
I had polish signs inside the code in R (eg. ą, ę, ź, ż etc.) and had the problem while running this R script with .bat file (in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).
My solution:
It worked for me but if there is still the problem, try to use the other encodings.
You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.
Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).
Alternatively, you can set the console to use another codepage:
chcp 1252
should do the trick. At least it worked for me here.
When you do output redirection, such as with dir
, the same rules apply. The console window's codepage is used. You can use the /u
switch to cmd.exe
to force Unicode output redirection, which causes the resulting files to be in UTF-16.
As for encodings and code pages in cmd.exe
in general, also see this question:
EDIT: As for your edit: No, cmd
always assumes the batch file to be written in the console default codepage. However, you can easily include a chcp
at the start of the batch:
chcp 1252>NUL
ren -hélice hélice
To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:
@echo off
for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
chcp 1252>nul
ren -hélice hélice
chcp %cp%>nul
I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.
For example, I'm in codepage 437 (chcp
tells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.
Then you find the Unicode character with the same number.
The Unicode character at 248 (U+00F8) is .
If you insert the Unicode character in your batch script, it will display to the console as the character you desire.
So my batch file
echo
prints
°
I care about three concepts:
Output Console Encoding
Command line internal encoding (that changed with chcp)
.bat Text Encoding
The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu Encoding → Character sets → Western European → OEM 850).
But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character sets → Western European → Windows-1252)
Then I would change the command line internal encoding, with chcp 1252.
This changes the encoding it uses to talk with other processes, neither the input device nor output console.
So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is Ú).
Then I modify the file as follows:
@echo off
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
ren -hlice hlice
First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."
Then I put this boilerplate each time I need to output something
perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"
I substitute the actual text I'll show for this: ren -hélice hélice.
And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.
And just below I put the desired command.
I did broke the problematic line into the output half and the real command half.
The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.
The second, the real command (muttered with @echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.