batch: add a unicode header or how to add hex values or any other ways around this?

前端 未结 3 1166
暖寄归人
暖寄归人 2021-01-22 05:14

I have a batch script that uses drag and drop and creates some html code based on the filenames of the dropped files/folders. With

chcp 65001

相关标签:
3条回答
  • 2021-01-22 05:23

    You can embed a base64 encoded section in a batch script which will create a 2 byte file and then use copy /b "my_header_file.bin" + "myfile.html" newfile.htm" to add the target file to it.

    It uses certutil to decode (and certutil with -encode to create the text section) so requires Vista and higher.

    Here is the script to create the header file containing hex: FF FE

    @echo off
    (
    echo -----BEGIN CERTIFICATE-----
    echo //4=
    echo -----END CERTIFICATE-----
    )>header.tmp
    certutil -decode -f header.tmp "my_header_file.bin" >nul
    del header.tmp
    
    copy /b "my_header_file.bin" + "myfile.html" "newfile.html"
    move /y "newfile.htm" "myfile.html" >nul
    del "my_header_file.bin"
    
    0 讨论(0)
  • 2021-01-22 05:26

    Include them inside your batch file.

    @echo off
    
        for /f "tokens=2 delims=:" %%f in ('findstr /b /c:"BOFM:" "%~dpnx0"') do echo %%f
    
    exit /b
    rem Here starts the special characters part
    BOFM:ÿþ:
    

    The line which starts with BOFM: is typed as ALT+charchode to get the desired characters.

    EDITED -

    I give up. I'm not able to make it work consistently with multiple pagecodes across batch file, datafiles and editors. There is no way to guarantee what will be generated. So, i took @foxidrive answer (awesome!) to generate the file prefix and tried.

    What i've found is that if we use FF FE as a prefix for a file generated from cmd not in unicode mode (/u parameter) but with a unicode pagecode (65001), we are generating a file marked as unicode (the prefix) but the content is not, we only generate one byte per character. So we get the "chinese"? characters, just a bad translation of a single byte character flow into two byte characters.

    If we use the same prefix but from a unicode cmd (with /u parameter) and an unicode pagecode (65001), then a real unicode file is generated, and the content is correctly shown from command line, notepad and browsers (tested in ie and firefox). But this is a real unicode file, so two bytes per character are generated.

    Instead of FF FE, we can send a utf8 BOM EF BB BF, from a non unicode cmd but with unicode pagecode. This generates a utf8 with BOM prefix, one or multibyte for character (depends on utf encoding of each character) which shows correctly in editors and browsers but not in command line.

    The code (adapted from OP attached files) i've been trying is (to be run from a non unicode cmd):

    @echo off
    
        if ["%~1"]==[""] goto :EOF
    
        setlocal enableextensions enabledelayedexpansion
    
        rem File to generate
        set "myFile=aText.txt"
    
        rem save current pagecode
        for /f "tokens=2 delims=:" %%f in ('chcp') do set "cp=%%f"
    
        rem Generate BOM
        call :generateBOM "%myFile%"
    
        rem change to unicode 
        chcp 65001 > nul 
    
    :loop
        echo %1 >> "%myFile%"
        for %%a in ("%1") do (
            echo %%~nxa 
            echo   ^<br^>^<img src='%%~nxa'^>^<br^> 
        ) >> "%myFile%"
    
        shift
        if ["%~1"]==[""] goto showData
        goto loop   
    
    :showData
    
        "%myFile%"
    
    :endProcess
        rem Cleanup and restore pagecode
        endlocal & chcp %cp% > nul 
    
        exit /b 
    
    :generateBOM file
        rem [ EF BB BF ] utf8 bom     encoded value = 77u/
        rem [ FF FE ]    unicode bom  encoded value = //4=
        echo 77u/>"%~1"
    
        rem Yes, certutil allows decode inplace, so no temporary file needed
        certutil -f -decode "%~1" "%~1" >nul
    
        endlocal
        goto :EOF
    
    0 讨论(0)
  • 2021-01-22 05:47

    You could create the Unicode header (0xFF 0xFE) by CertUtil -decodehex:

    rem // Create hexadecimal-encoded file:
    > "header.tmp" (echo FF FE)
    rem // Decode file to binary header file:
    > nul CertUtil -f -decodehex "header.tmp" "header.tmp"
    
    rem // Combine binary header file and Unicode text file:
    copy /B "header.tmp" + "U-file.txt" "header.tmp"
    rem // Move combined file over original Unicode text file:
    move /Y "header.tmp" "U-file.txt"
    

    A method using forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo(0xFF0xFE" is problematic, because echo produces a trailing line-break. An alternative to echo(0xFF0xFE is < nul set /P ="0xFF0xFE", but this does not work either, because set /P removes leading white-spaces from the message text, and 0xFF is considered as such (it is a non-break space), unfortunately.

    0 讨论(0)
提交回复
热议问题