How to expand file content with powershell

匿名 (未验证) 提交于 2019-12-03 09:06:55

问题:

I want to do this :

$content = get-content "test.html" $template = get-content "template.html" $template | out-file "out.html" 

where template.html contains

<html>   <head>   </head>   <body>     $content   </body> </html> 

and test.html contains:

<h1>Test Expand</h1> <div>Hello</div> 

I get weird characters in first 2 characters of out.html :

and content is not expanded.

How to fix this ?

回答1:

To complement Mathias R. Jessen's helpful answer with a solution that:

  • is more efficient.
  • ensures that the input files are read as UTF-8, even if they don't have a (pseudo-)BOM (byte-order mark).
  • avoids the "weird character" problem altogether by writing a UTF-8-encoded output file without that pseudo-BOM.
# Explicitly read the input files as UTF-8, as a whole. $content =  get-content -raw -encoding utf8 test.html $template = get-content -raw -encoding utf8 template.html  # Write to output file using UTF-8 encoding *without a BOM*. [IO.File]::WriteAllText(   "$PWD/out.html",   $ExecutionContext.InvokeCommand.ExpandString($template) ) 
  • get-content -raw (PSv3+) reads the files in as a whole, into a single string (instead of an array of strings, line by line), which, while more memory-intensive, is faster. With HTML files, memory usage shouldn't be a concern.

    • An additional advantage of reading the files in full is that if the template were to contain multi-line subexpressions ($(...)), the expansion would still function correctly.
  • get-content -encoding utf8 ensures that the input files are interpreted as using character encoding UTF-8, as is typical in the web world nowadays.

    • This is crucial, given that UTF-8-encoded HTML files normally do not have the 3-byte pseudo-BOM that PowerShell needs in order to correctly identify a file as UTF-8-encoded (see below).
  • A single $ExecutionContext.InvokeCommand.ExpandString() call is then sufficient to perform the template expansion.

  • Out-File -Encoding utf8 would invariably create a file with the pseudo-BOM, which is undesired.
    Instead, [IO.File]::WriteAllText() is used, taking advantage of the fact that the .NET Framework by default creates UTF-8-encoded files without the BOM.

    • Note the use of $PWD/ before out.html, which is needed to ensure that the file gets written in PowerShell's current location (directory); unfortunately, what the .NET Framework considers the current directory is not in sync with PowerShell.

Finally, the obligatory security warning: use this expansion technique only on input that you trust, given that arbitrary embedded commands may get executed.


Optional background information

PowerShell's Out-File, > and >> use UTF-16 LE character encoding with a BOM (byte-order mark) by default (the "weird characters", as mentioned).

While Out-File -Encoding utf8 allows creating UTF-8 output files instead,
PowerShell invariably prepends a 3-byte pseudo-BOM to the output file, which some utilities, notably those with Unix heritage, have problems with - so you would still get "weird characters" (albeit different ones).

If you want a more PowerShell-like way of creating BOM-less UTF-8 files, see this answer of mine, which defines an Out-FileUtf8NoBom function that otherwise emulates the core functionality of Out-File.

Conversely, on reading files, you must use Get-Content -Encoding utf8 to ensure that BOM-less UTF-8 files are recognized as such.
In the absence of the UTF-8 pseudo-BOM, Get-Content assumes that the file uses the single-byte, extended-ASCII encoding specified by the system's legacy codepage (e.g., Windows-1252 on English-language systems, an encoding that PowerShell calls Default).

Note that while Windows-only editors such as Notepad create UTF-8 files with the pseudo-BOM (if you explicitly choose to save as UTF-8; default is the legacy codepage encoding, "ANSI"), increasingly popular cross-platform editors such as Visual Studio Code, Atom, and Sublime Text by default do not use the pseudo-BOM when they create files.



回答2:

For the "weird characters", they're probably BOMs (Byte-order marks). Specify the output encoding explicitly with the -Encoding parameter when using Out-File, for example:

$Template |Out-File out.html -Encoding UTF8 

For the string expansion, you need to explicitly tell powershell to do so:

$Template = $Template |ForEach-Object {     $ExecutionContext.InvokeCommand.ExpandString($_) } $Template | Out-File out.html -Encoding UTF8 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!