Batch file encoding

前端 未结 5 754
小鲜肉
小鲜肉 2020-12-02 10:38

I would like to deal with filename containing strange characters, like the French é.

Everything is working fine in the shell:

C:\\somedir\\>ren -h         


        
相关标签:
5条回答
  • 2020-12-02 10:50

    I created the following block, which I put at the beginning of my batch files:

    set Filename=%0
    IF "%Filename:~-8%" == "-850.bat" GOTO CONVERT_CODEPAGE_END
        rem Converting code page from 1252 to 850.
        rem My editors use 1252, my batch uses 850.
        rem We create a converted -850.bat file, and then launch it.
        set File850=%~n0-850.bat
        PowerShell.exe -Command "get-content %0 | out-file -encoding oem -filepath %File850%"
        call %File850%
        del %File850%
        EXIT /b 0
    :CONVERT_CODEPAGE_END
    
    0 讨论(0)
  • 2020-12-02 10:51

    I had polish signs inside the code in R (eg. ą, ę, ź, ż etc.) and had the problem while running this R script with .bat file (in the output file .Rout instead of those signs there were signs like %, &, # etc. and the code didn't run to the end).

    My solution:

    1. Save R script with encoding: File > Save with encoding > CP1250
    2. Run .bat file

    It worked for me but if there is still the problem, try to use the other encodings.

    0 讨论(0)
  • 2020-12-02 10:55

    You have to save the batch file with OEM encoding. How to do this varies depending on your text editor. The encoding used in that case varies as well. For Western cultures it's usually CP850.

    Batch files and encoding are really two things that don't particularly like each other. You'll notice that Unicode is also impossible to use there, unfortunately (even though environment variables handle it fine).

    Alternatively, you can set the console to use another codepage:

    chcp 1252
    

    should do the trick. At least it worked for me here.

    When you do output redirection, such as with dir, the same rules apply. The console window's codepage is used. You can use the /u switch to cmd.exe to force Unicode output redirection, which causes the resulting files to be in UTF-16.

    As for encodings and code pages in cmd.exe in general, also see this question:

    • What encoding/code page is cmd.exe using

    EDIT: As for your edit: No, cmd always assumes the batch file to be written in the console default codepage. However, you can easily include a chcp at the start of the batch:

    chcp 1252>NUL
    ren -hélice hélice
    

    To make this more robust when used directly from the commandline, you may want to memorize the old code page and restore it afterwards:

    @echo off
    for /f "tokens=2 delims=:." %%x in ('chcp') do set cp=%%x
    chcp 1252>nul
    ren -hélice hélice
    chcp %cp%>nul
    
    0 讨论(0)
  • 2020-12-02 11:14

    I was having trouble with this, and here is the solution I found. Find the decimal number for the character you are looking for in your current code page.

    For example, I'm in codepage 437 (chcp tells you), and I want a degree sign, . http://en.wikipedia.org/wiki/Code_page_437 tells me that the degree sign is number 248.

    Then you find the Unicode character with the same number.

    The Unicode character at 248 (U+00F8) is .

    If you insert the Unicode character in your batch script, it will display to the console as the character you desire.

    So my batch file

    echo
    

    prints

    °
    
    0 讨论(0)
  • 2020-12-02 11:16

    I care about three concepts:

    1. Output Console Encoding

    2. Command line internal encoding (that changed with chcp)

    3. .bat Text Encoding

    The easiest scenario to me: I will have the first two mentioned in the same encoding, say CP850, and I will store my .bat in that same encoding (in Notepad++, menu EncodingCharacter setsWestern EuropeanOEM 850).

    But suppose someone hands me a .bat in another encoding, say CP1252 (in Notepad++, menu Encoding* → Character setsWestern EuropeanWindows-1252)

    Then I would change the command line internal encoding, with chcp 1252.

    This changes the encoding it uses to talk with other processes, neither the input device nor output console.

    So my command line instance will effectively send characters in 1252 through its STDOUT file descriptor, but gabbed text appears when the console decodes them out as 850 (é is Ú).

    Then I modify the file as follows:

    @echo off
    
    perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hlice hlice\n\"));"
    ren -hlice hlice
    

    First I turn echo off so the commands don't output unless explicitly doing either echo... or perl -e "print..."

    Then I put this boilerplate each time I need to output something

    perl -e "use Encode qw/encode decode/;" -e "print encode('cp850', decode('cp1252', \"ren -hélice hélice\n\"));"

    I substitute the actual text I'll show for this: ren -hélice hélice.

    And also I could need to substitute my console encoding for cp850 and other side encoding for cp1252.

    And just below I put the desired command.

    I did broke the problematic line into the output half and the real command half.

    • The first I make for sure: The "é" is interpreted as an "é" by means of transcoding. It is necessary for all the output sentences since the console and the file are at different encodings.

    • The second, the real command (muttered with @echo off), knowing we have the same encoding both from chcp and the .bat text is enough to ensure a proper character interpretation.

    0 讨论(0)
提交回复
热议问题