How to cat a UTF-8 (no BOM) file properly/globally in PowerShell?

老子叫甜甜 提交于 2019-12-03 17:21:02

问题


Create a file utf8.txt. Ensure the encoding is UTF-8 (no BOM). Set its content to

In cmd.exe:

type utf8.txt > out.txt

Content of out.txt is

In PowerShell (v4):

cat .\utf8.txt > out.txt

or

type .\utf8.txt > out.txt

Out.txt content is €

How do I globally make PowerShell work correctly?


回答1:


Windows PowerShell, unlike the underlying .NET framework[1] , uses the following defaults:

  • on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).

  • on output: the > and >> redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).

File-consuming and -producing cmdlets do usually support an -Encoding parameter that lets you specify the encoding explicitly.
Prior to PowerShell v5.1, using the underlying Out-File cmdlet explicitly was the only way to change the encoding.
In PowerShell v5.1+, > and >> became effective aliases of Out-File, allowing you to change the encoding behavior of > and >> via the $PSDefaultParameterValues preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.

For PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2] , but note that on output, PowerShell invariably adds a BOM to UTF-8 files.

Applied to your example:

Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt

To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.


By contrast, PowerShell Core, the cross-platform edition of PowerShell, fortunately defaults to BOM-less UTF-8 on both in- and output.


[1] The .NET framework uses UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between PowerShell and the .NET framework it is built on is unusual.

[2] Get-Content does, however, automatically recognize UTF-8 files with a BOM.



来源:https://stackoverflow.com/questions/37767067/how-to-cat-a-utf-8-no-bom-file-properly-globally-in-powershell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!