Using PowerShell to write a file in UTF-8 without the BOM

前端 未结 13 1117
名媛妹妹
名媛妹妹 2020-11-22 06:25

Out-File seems to force the BOM when using UTF-8:

$MyFile = Get-Content $MyPath
$MyFile | Out-File -Encoding \"UTF8\" $MyPath

相关标签:
13条回答
  • 2020-11-22 07:17

    If you want to use [System.IO.File]::WriteAllLines(), you should cast second parameter to String[] (if the type of $MyFile is Object[]), and also specify absolute path with $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), like:

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
    Get-ChildItem | ConvertTo-Csv | Set-Variable MyFile
    [System.IO.File]::WriteAllLines($ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath($MyPath), [String[]]$MyFile, $Utf8NoBomEncoding)
    

    If you want to use [System.IO.File]::WriteAllText(), sometimes you should pipe the second parameter into | Out-String | to add CRLFs to the end of each line explictly (Especially when you use them with ConvertTo-Csv):

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
    Get-ChildItem | ConvertTo-Csv | Out-String | Set-Variable tmp
    [System.IO.File]::WriteAllText("/absolute/path/to/foobar.csv", $tmp, $Utf8NoBomEncoding)
    

    Or you can use [Text.Encoding]::UTF8.GetBytes() with Set-Content -Encoding Byte:

    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
    Get-ChildItem | ConvertTo-Csv | Out-String | % { [Text.Encoding]::UTF8.GetBytes($_) } | Set-Content -Encoding Byte -Path "/absolute/path/to/foobar.csv"
    

    see: How to write result of ConvertTo-Csv to a file in UTF-8 without BOM

    0 讨论(0)
  • 2020-11-22 07:19

    Could use below to get UTF8 without BOM

    $MyFile | Out-File -Encoding ASCII
    
    0 讨论(0)
  • 2020-11-22 07:21

    Note: This answer applies to Windows PowerShell; by contrast, in the cross-platform PowerShell Core edition (v6+), UTF-8 without BOM is the default encoding, across all cmdlets.
    In other words: If you're using PowerShell [Core] version 6 or higher, you get BOM-less UTF-8 files by default (which you can also explicitly request with -Encoding utf8 / -Encoding utf8NoBOM, whereas you get with-BOM encoding with -utf8BOM).


    To complement M. Dudley's own simple and pragmatic answer (and ForNeVeR's more concise reformulation):

    For convenience, here's advanced function Out-FileUtf8NoBom, a pipeline-based alternative that mimics Out-File, which means:

    • you can use it just like Out-File in a pipeline.
    • input objects that aren't strings are formatted as they would be if you sent them to the console, just like with Out-File.
    • an additional -UseLF switch allows you transform Windows-style CRLF newlines to Unix-style LF-only newlines.

    Example:

    (Get-Content $MyPath) | Out-FileUtf8NoBom $MyPath # Add -UseLF for Unix newlines
    

    Note how (Get-Content $MyPath) is enclosed in (...), which ensures that the entire file is opened, read in full, and closed before sending the result through the pipeline. This is necessary in order to be able to write back to the same file (update it in place).
    Generally, though, this technique is not advisable for 2 reasons: (a) the whole file must fit into memory and (b) if the command is interrupted, data will be lost.

    A note on memory use:

    • M. Dudley's own answer requires that the entire file contents be built up in memory first, which can be problematic with large files.
    • The function below improves on this only slightly: all input objects are still buffered first, but their string representations are then generated and written to the output file one by one.

    Source code of function Out-FileUtf8NoBom:

    Note: The function is also available as an MIT-licensed Gist, and only it will be maintained going forward.

    You can install it directly with the following command (while I can personally assure you that doing so is safe, you should always check the content of a script before directly executing it this way):

    # Download and define the function.
    irm https://gist.github.com/mklement0/8689b9b5123a9ba11df7214f82a673be/raw/Out-FileUtf8NoBom.ps1 | iex
    
    function Out-FileUtf8NoBom {
    <#
    .SYNOPSIS
      Outputs to a UTF-8-encoded file *without a BOM* (byte-order mark).
    .DESCRIPTION
      Mimics the most important aspects of Out-File:
        * Input objects are sent to Out-String first.
        * -Append allows you to append to an existing file, -NoClobber prevents
          overwriting of an existing file.
        * -Width allows you to specify the line width for the text representations
           of input objects that aren't strings.
      However, it is not a complete implementation of all Out-File parameters:
        * Only a literal output path is supported, and only as a parameter.
        * -Force is not supported.
        * Conversely, an extra -UseLF switch is supported for using LF-only newlines.
      Caveat: *All* pipeline input is buffered before writing output starts,
              but the string representations are generated and written to the target
              file one by one.
    .NOTES
      The raison d'être for this advanced function is that Windows PowerShell
      lacks the ability to write UTF-8 files without a BOM: using -Encoding UTF8 
      invariably prepends a BOM.
      Copyright (c) 2017, 2020 Michael Klement <mklement0@gmail.com> (http://same2u.net), 
      released under the [MIT license](https://spdx.org/licenses/MIT#licenseText).
    #>
    
      [CmdletBinding()]
      param(
        [Parameter(Mandatory, Position=0)] [string] $LiteralPath,
        [switch] $Append,
        [switch] $NoClobber,
        [AllowNull()] [int] $Width,
        [switch] $UseLF,
        [Parameter(ValueFromPipeline)] $InputObject
      )
    
      #requires -version 3
    
      # Convert the input path to a full one, since .NET's working dir. usually
      # differs from PowerShell's.
      $dir = Split-Path -LiteralPath $LiteralPath
      if ($dir) { $dir = Convert-Path -ErrorAction Stop -LiteralPath $dir } else { $dir = $pwd.ProviderPath}
      $LiteralPath = [IO.Path]::Combine($dir, [IO.Path]::GetFileName($LiteralPath))
    
      # If -NoClobber was specified, throw an exception if the target file already
      # exists.
      if ($NoClobber -and (Test-Path $LiteralPath)) {
        Throw [IO.IOException] "The file '$LiteralPath' already exists."
      }
    
      # Create a StreamWriter object.
      # Note that we take advantage of the fact that the StreamWriter class by default:
      # - uses UTF-8 encoding
      # - without a BOM.
      $sw = New-Object System.IO.StreamWriter $LiteralPath, $Append
    
      $htOutStringArgs = @{}
      if ($Width) {
        $htOutStringArgs += @{ Width = $Width }
      }
    
      # Note: By not using begin / process / end blocks, we're effectively running
      #       in the end block, which means that all pipeline input has already
      #       been collected in automatic variable $Input.
      #       We must use this approach, because using | Out-String individually
      #       in each iteration of a process block would format each input object
      #       with an indvidual header.
      try {
        $Input | Out-String -Stream @htOutStringArgs | % { 
          if ($UseLf) {
            $sw.Write($_ + "`n") 
          }
          else {
            $sw.WriteLine($_) 
          }
        }
      } finally {
        $sw.Dispose()
      }
    
    }
    
    0 讨论(0)
  • 2020-11-22 07:21

    Starting from version 6 powershell supports the UTF8NoBOM encoding both for set-content and out-file and even uses this as default encoding.

    So in the above example it should simply be like this:

    $MyFile | Out-File -Encoding UTF8NoBOM $MyPath
    
    0 讨论(0)
  • 2020-11-22 07:21

    One technique I utilize is to redirect output to an ASCII file using the Out-File cmdlet.

    For example, I often run SQL scripts that create another SQL script to execute in Oracle. With simple redirection (">"), the output will be in UTF-16 which is not recognized by SQLPlus. To work around this:

    sqlplus -s / as sysdba "@create_sql_script.sql" |
    Out-File -FilePath new_script.sql -Encoding ASCII -Force
    

    The generated script can then be executed via another SQLPlus session without any Unicode worries:

    sqlplus / as sysdba "@new_script.sql" |
    tee new_script.log
    
    0 讨论(0)
  • 2020-11-22 07:22

    Using .NET's UTF8Encoding class and passing $False to the constructor seems to work:

    $MyRawString = Get-Content -Raw $MyPath
    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
    [System.IO.File]::WriteAllLines($MyPath, $MyRawString, $Utf8NoBomEncoding)
    
    0 讨论(0)
提交回复
热议问题