I am trying to put together a script that will convert several excel files into PDFs. This is my first time doing something like this in Powershell. I found a link to one on
To clarify:
It is perfectly fine to use Unicode (non-ASCII-range) quotation marks such as “ in PowerShell - see the bottom section.
However, in order to use such characters in script files, these files must use a Unicode character encoding such as UTF-8 or UTF-16LE ("Unicode").
Your problem was that your script file was saved as UTF-8 without a BOM, which causes Windows PowerShell (but not PowerShell Core) to misinterpret it, because it defaults to "ANSI" encoding, i.e., the single-byte legacy encoding associated with the legacy system locale (e.g., Windows-1252 in the US and Western Europe), which PowerShell calls Default
.
While replacing the Unicode quotation marks with their ASCII counterparts solves the immediate problem, any other non-ASCII-range characters in the script would continue to be misinterpreted.
To demonstrate the specific problem:
“
, the LEFT DOUBLE QUOTATION MARK (U+201C
) Unicode character, is encoded as 3 bytes in UTF-8 format: 0xE2 0x80 0x9C
.
'“' | Format-Hex -Encoding Utf8
(only the byte sequence matters here; the printed chars. on the right are not representative in this case).When Windows PowerShell reads this sequence as "ANSI"-encode, it considers each byte a character in its own right, which is why you saw 3 characters for the single “
in your output, namely “
.
[Text.Encoding]::Default.GetString([byte[]] (0xE2, 0x80, 0x9C))
(from PowerShell Core, use [Text.Encoding]::GetEncoding([cultureinfo]::CurrentCulture.TextInfo.ANSICodePage).GetString([byte[]] (0xE2, 0x80, 0x9C))
).In a properly encoded input file, PowerShell allows interchangeable use of the following quotation and punctuation characters; e.g., "hi"
, ”hi”
and even "hi„
are equivalent.
Double quotes:
"
(ASCII-range) - QUOTATION MARK (U+0022)“
- LEFT DOUBLE QUOTATION MARK (U+201C)”
- RIGHT DOUBLE QUOTATION MARK (U+201D)„
- DOUBLE LOW-9 QUOTATION MARK (U+201E)
But not: ‟
- DOUBLE HIGH-REVERSED-9 QUOTATION MARK (U+201F), even though its single-quote counterpart is recognized - see this GitHub issue.
Single quotes:
'
- (ASCII-range) APOSTROPHE (U+0027)‘
- LEFT SINGLE QUOTATION MARK (U+2018)’
- RIGHT SINGLE QUOTATION MARK (U+2019)‚
- SINGLE LOW-9 QUOTATION MARK (U+201A)‛
- SINGLE HIGH-REVERSED-9 QUOTATION MARK (U+201B)Dashes (strictly speaking, the ASCII-range "dash" is a hyphen):
-
(ASCII-range) - HYPHEN-MINUS (U+002D)–
- EN DASH (U+2013)—
- EM DASH (U+2014)―
- HORIZONTAL BAR (U+2015)Whitespace:
Note: The source-code location linked to below doesn't define equivalent whitespace characters explicitly (unlike quotation marks and dashes). The following was gleaned from experiments based on Unicode character descriptions and may be incomplete. Characters outside the Unicode BMP (basic multilingual plane), i.e. those with code point that won't fit into the 16-bit code units .NET uses to represent characters, were excluded.
Intra-line whitespace:
Note: Space-character and tab-character variations can interchangeably serve as syntactic word separators. Among the space-character variations, only the U+200B
(ZERO WIDTH SPACE) char. is not considered a space syntactically.
Spaces:
(ASCII-range space char.) U+0020
(SPACE)
U+00A0
(NO-BREAK SPACE)
U+2002
(EN SPACE)
U+2003
(EM SPACE)
U+2004
(THREE-PER-EM SPACE)
U+2005
(FOUR-PER-EM SPACE))
U+2006
(SIX-PER-EM SPACE)
U+2007
(FIGURE SPACE)
U+2008
(PUNCTUATION SPACE)
U+2009
(THIN SPACE)
U+200A
(HAIR SPACE)
U+202F
(NARROW NO-BREAK SPACE)
U+205F
(MEDIUM MATHEMATICAL SPACE)
U+3000
(IDEOGRAPHIC SPACE)Tabulators (shown as escape sequences due not being directly printable here):
"`t"
(ASCII-range tab char.) - U+0009
(CHARACTER TABULATION)"`v"
(ASCII-range vertical-tab char.) - U+000B
(LINE TABULATION))Line-separating whitespace:
(ASCII-range LF) U+000A
(LINE FEED)
(ASCII-range CR) U+000D
(CARRIAGE RETURN)Note:
Important: The above describes interchangeable syntactic use of these characters; if you use such characters in identifiers (which you shouldn't) or in strings[1], they are not treated the same.
The above was in part gleaned from the source code on GitHub (class SpecialCharacters
in file parserutils.cs
).
[1] There are limited exceptions: given that PowerShell's -eq
operator compares string using the invariant culture rather than performing ordinal comparison, space-character variations may be treated the same in string comparisons, depending on the host platform; e.g., "foo bar" -eq "foo`u{a0}bar"
yields $true
on macOS and Linux (but not Windows!), because the regular ASCII-range space is considered equal to the no-break space (U+00A0
) there.