Powershell and UTF-8

↘锁芯ラ 提交于 2019-12-12 09:23:06

问题


I have an html file test.html created with atom which contains:

Testé encoding utf-8

When I read it with Powershell console (I'm using French Windows)

Get-Content -Raw test.html

I get back this:

Testé encoding utf-8

Why is the accent character not printing correctly?


回答1:


  • The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).

    • Other popular cross-platform editors, such as Visual Studio Code and Sublime Text, behave the same way.
  • Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.

    • In the absence of the pseudo-BOM, PowerShell interprets files as being formatted according to the system's legacy codepage, such as Windows-1252 on US systems, for instance.
      (This is also the default encoding used by Notepad, which it calls "ANSI", not just when reading files, but also when creating them. By contrast, PowerShell creates UTF-16LE-encoded files by default.)

Therefore, in order for Get-Content to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8.


[1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.




回答2:


# Created a UTF-8 Sig File 
notepad .\test.html

# Get File contents with/without -raw
cat .\test.html;Get-Content -Raw .\test.html
Testé encoding utf-8
Testé encoding utf-8

# Check Encoding to make sure
Get-FileEncoding .\test.html
utf8

As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.

If you do not have Get-FileEncoding as a cmdlet in your PowerShell, here is an implementation you can run:

function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
        '^efbbbf'   {return 'utf8'}
        '^2b2f76'   {return 'utf7'}
        '^fffe'     {return 'unicode'}
        '^feff'     {return 'bigendianunicode'}
        '^0000feff' {return 'utf32'}
        default     {return 'ascii'}
    }
}


来源:https://stackoverflow.com/questions/42542560/powershell-and-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!