问题
I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input. For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.
This is the output I get with cmd (test.txt contains "this is a test"):
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test
It also works the other way, thus it is a symmetrical encryption algorithm
type .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
this is a test
But, the behaviour on PowerShell is different. When encrypting first, I get:
type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
this is a test_*
And that is what I get when decrypting first:
Maybe is an encoding problem. Thanks in advance.
回答1:
tl;dr:
If you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Instead, shell out to cmd
with /c
:
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Note that if you want to capture the output in a PowerShell variable, you need to make sure that [Console]::OutputEncoding
matches your .\Crypt.exe
program's (effective) output encoding (the active OEM code page), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which as a simple solution:
Problem 1: There is indeed an encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String
), which are sequences of UTF-16 code units.
In order to send to and receive data from external programs, you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
On sending data, PowerShell uses the encoding of the
$OutputEncoding
preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and UTF-8 in PowerShell [Core].The receiving end is covered by default: PowerShell uses
[Console]::OutputEncoding
(which itself reflects the code page reported bychcp
) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding
to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe
doesn't send (the $OutputEncoding
-encoded bytes representing) "foo"
to .\Crypt.exe
's stdin, it sends "foo`r`n"
on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in this GitHub issue and also in this answer.
In your specific case, the implicitly appended "`r`n"
is also subject to the byte-value-shifting, which means that the 1st Crypt.exe
calls transforms it to -*
, causing another "`r`n"
to be appended when the data is sent to the 2nd Crypt.exe
call.
The net result is an extra newline that is round-tripped (the intermediate -*
), plus an encrypted newline that results in φΩ
).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c
as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Unlike cmd
(or POSIX-like shells such as bash
):
- PowerShell doesn't support raw byte data in pipelines.[2]
- When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
When you send data to an external program via the pipeline (to its stdin stream):
It is converted to text (strings) using the character encoding specified in the
$OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell [Core].- If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
- If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
- This behavior can cause problems, as discussed in this GitHub issue and also in this answer.
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in
[Console]::OutputEncoding
, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).PowerShell-internally text is represented using the .NET System.String type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
The above also applies:
when piping data between external programs,
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell,
>
produces UTF-16LE-encoded files (with BOM), whereas PowerShell [Core] sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
Adding support for raw data passing between external programs and to-file redirections is the subject of this GitHub issue.
[1] In PowerShell [Core], given that $OutputEncoding
commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding
be the same - i.e., for the active code page to be effectively 65001
on Windows, as suggested in this GitHub issue.
[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte
array with Get-Content -AsByteStream
(PowerShell [Core]) / Get-Content -Encoding Byte
(Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.
[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.
回答2:
Cmd uses 8 bit OEM. Powershell uses Unicode.
Standard (and automatic) conversion would be from locale specific OEM to locale specific ANSI, then ANSI to Unicode.
See https://docs.microsoft.com/en-us/windows/console/console-code-pages
In Unicode characters 0 - 31 and 128 - 160 don't have glyphs. They are control characters.
I got tired of there not being a Unicode character table (only ANSI) so I wrote one.
Name OEM Type Range (Unicode conversion of OEM Character)
0 0x0 ␀ Control Control Codes
1 0x1 ␁ ☺ Control Control Codes
2 0x2 ␂ ☻ Control Control Codes
3 0x3 ␃ ♥ Control Control Codes
4 0x4 ␄ ♦ Control Control Codes
5 0x5 ␅ ♣ Control Control Codes
6 0x6 ␆ ♠ Control Control Codes
7 0x7 ␇ • Control Control Codes
8 0x8 ␈ ◘ Control Control Codes
9 0x9 ␉ ○ Blank Control Space Control Codes
10 0xA ␊ ◙ Control Space Control Codes
11 0xB ␋ ♂ Control Space Control Codes
12 0xC ␌ ♀ Control Space Control Codes
13 0xD ␍ ♪ Control Space Control Codes
14 0xE ␎ ♫ Control Control Codes
15 0xF ␏ ☼ Control Control Codes
16 0x10 ␐ ► Control Control Codes
17 0x11 ␑ ◄ Control Control Codes
18 0x12 ␒ ↕ Control Control Codes
19 0x13 ␓ ‼ Control Control Codes
20 0x14 ␔ ¶ Control Control Codes
21 0x15 ␕ § Control Control Codes
22 0x16 ␖ ▬ Control Control Codes
23 0x17 ␗ ↨ Control Control Codes
24 0x18 ␘ ↑ Control Control Codes
25 0x19 ␙ ↓ Control Control Codes
26 0x1A ␚ → Control Control Codes
27 0x1B ␛ ← Control Control Codes
28 0x1C ␜ ∟ Control Control Codes
29 0x1D ␝ ↔ Control Control Codes
30 0x1E ␞ ▲ Control Control Codes
31 0x1F ␟ ▼ Control Control Codes
Char Type Range
32 0x20 Blank Space Basic Latin
33 0x21 ! Punct Basic Latin
34 0x22 " Punct Basic Latin
35 0x23 # Punct Basic Latin
36 0x24 $ Punct Basic Latin
37 0x25 % Punct Basic Latin
38 0x26 & Punct Basic Latin
39 0x27 ' Punct Basic Latin
40 0x28 ( Punct Basic Latin
41 0x29 ) Punct Basic Latin
42 0x2A * Punct Basic Latin
43 0x2B + Punct Basic Latin
44 0x2C , Punct Basic Latin
45 0x2D - Punct Basic Latin
46 0x2E . Punct Basic Latin
47 0x2F / Punct Basic Latin
48 0x30 0 Number Hex Basic Latin
49 0x31 1 Number Hex Basic Latin
50 0x32 2 Number Hex Basic Latin
51 0x33 3 Number Hex Basic Latin
52 0x34 4 Number Hex Basic Latin
53 0x35 5 Number Hex Basic Latin
54 0x36 6 Number Hex Basic Latin
55 0x37 7 Number Hex Basic Latin
56 0x38 8 Number Hex Basic Latin
57 0x39 9 Number Hex Basic Latin
58 0x3A : Punct Basic Latin
59 0x3B ; Punct Basic Latin
60 0x3C < Punct Basic Latin
61 0x3D = Punct Basic Latin
62 0x3E > Punct Basic Latin
63 0x3F ? Punct Basic Latin
64 0x40 @ Punct Basic Latin
65 0x41 A Alpha Upper Hex Basic Latin
66 0x42 B Alpha Upper Hex Basic Latin
67 0x43 C Alpha Upper Hex Basic Latin
68 0x44 D Alpha Upper Hex Basic Latin
69 0x45 E Alpha Upper Hex Basic Latin
70 0x46 F Alpha Upper Hex Basic Latin
71 0x47 G Alpha Upper Basic Latin
72 0x48 H Alpha Upper Basic Latin
73 0x49 I Alpha Upper Basic Latin
74 0x4A J Alpha Upper Basic Latin
75 0x4B K Alpha Upper Basic Latin
76 0x4C L Alpha Upper Basic Latin
77 0x4D M Alpha Upper Basic Latin
78 0x4E N Alpha Upper Basic Latin
79 0x4F O Alpha Upper Basic Latin
80 0x50 P Alpha Upper Basic Latin
81 0x51 Q Alpha Upper Basic Latin
82 0x52 R Alpha Upper Basic Latin
83 0x53 S Alpha Upper Basic Latin
84 0x54 T Alpha Upper Basic Latin
85 0x55 U Alpha Upper Basic Latin
86 0x56 V Alpha Upper Basic Latin
87 0x57 W Alpha Upper Basic Latin
88 0x58 X Alpha Upper Basic Latin
89 0x59 Y Alpha Upper Basic Latin
90 0x5A Z Alpha Upper Basic Latin
91 0x5B [ Punct Basic Latin
92 0x5C \ Punct Basic Latin
93 0x5D ] Punct Basic Latin
94 0x5E ^ Punct Basic Latin
95 0x5F _ Punct Basic Latin
96 0x60 ` Punct Basic Latin
97 0x61 a Alpha Lower Hex Basic Latin
98 0x62 b Alpha Lower Hex Basic Latin
99 0x63 c Alpha Lower Hex Basic Latin
100 0x64 d Alpha Lower Hex Basic Latin
101 0x65 e Alpha Lower Hex Basic Latin
102 0x66 f Alpha Lower Hex Basic Latin
103 0x67 g Alpha Lower Basic Latin
104 0x68 h Alpha Lower Basic Latin
105 0x69 i Alpha Lower Basic Latin
106 0x6A j Alpha Lower Basic Latin
107 0x6B k Alpha Lower Basic Latin
108 0x6C l Alpha Lower Basic Latin
109 0x6D m Alpha Lower Basic Latin
110 0x6E n Alpha Lower Basic Latin
111 0x6F o Alpha Lower Basic Latin
112 0x70 p Alpha Lower Basic Latin
113 0x71 q Alpha Lower Basic Latin
114 0x72 r Alpha Lower Basic Latin
115 0x73 s Alpha Lower Basic Latin
116 0x74 t Alpha Lower Basic Latin
117 0x75 u Alpha Lower Basic Latin
118 0x76 v Alpha Lower Basic Latin
119 0x77 w Alpha Lower Basic Latin
120 0x78 x Alpha Lower Basic Latin
121 0x79 y Alpha Lower Basic Latin
122 0x7A z Alpha Lower Basic Latin
123 0x7B { Punct Basic Latin
124 0x7C | Punct Basic Latin
125 0x7D } Punct Basic Latin
126 0x7E ~ Punct Basic Latin
127 0x7F Control Basic Latin
UTF ANSI OEM Type Range (ANSI conversion of OEM Character eg ® replaced by R)
128 0x80 € ¼ Control Control Codes
129 0x81 ü Control Control Codes
130 0x82 ‚ → Control Control Codes
131 0x83 ƒ Æ Control Control Codes
132 0x84 „ ▲ Control Control Codes
133 0x85 … & Control Space Control Codes
134 0x86 † Control Control Codes
135 0x87 ‡ ! Control Control Codes
136 0x88 ˆ ã Control Control Codes
137 0x89 ‰ 0 Control Control Codes
138 0x8A Š ` Control Control Codes
139 0x8B ‹ 9 Control Control Codes
140 0x8C Œ R Control Control Codes
141 0x8D ì Control Control Codes
142 0x8E Ž } Control Control Codes
143 0x8F Å Control Control Codes
144 0x90 É Control Control Codes
145 0x91 ‘ ↑ Control Control Codes
146 0x92 ’ ↓ Control Control Codes
147 0x93 “ ∟ Control Control Codes
148 0x94 ” ↔ Control Control Codes
149 0x95 • " Control Control Codes
150 0x96 – ‼ Control Control Codes
151 0x97 — ¶ Control Control Codes
152 0x98 ˜ ▄ Control Control Codes
153 0x99 ™ " Control Control Codes
154 0x9A š a Control Control Codes
155 0x9B › : Control Control Codes
156 0x9C œ S Control Control Codes
157 0x9D Ø Control Control Codes
158 0x9E ž ~ Control Control Codes
159 0x9F Ÿ x Control Control Codes
160 0xA0 á Blank Space Latin-1 Supplement
161 0xA1 ¡ ¡ í Punct Latin-1 Supplement
162 0xA2 ¢ ¢ ó Punct Latin-1 Supplement
163 0xA3 £ £ ú Punct Latin-1 Supplement
164 0xA4 ¤ ¤ ñ Punct Latin-1 Supplement
165 0xA5 ¥ ¥ Ñ Punct Latin-1 Supplement
166 0xA6 ¦ ¦ ª Punct Latin-1 Supplement
167 0xA7 § § º Punct Latin-1 Supplement
168 0xA8 ¨ ¨ ¿ Punct Latin-1 Supplement
169 0xA9 © © ® Punct Latin-1 Supplement
170 0xAA ª ª ¬ Alpha Lower Punct Latin-1 Supplement
171 0xAB « « ½ Punct Latin-1 Supplement
172 0xAC ¬ ¬ ¼ Punct Latin-1 Supplement
173 0xAD ¡ Control Punct Latin-1 Supplement
174 0xAE ® ® « Punct Latin-1 Supplement
175 0xAF ¯ ¯ » Punct Latin-1 Supplement
176 0xB0 ° ° ░ Punct Latin-1 Supplement
177 0xB1 ± ± ▒ Punct Latin-1 Supplement
178 0xB2 ² ² ▓ Number Punct Latin-1 Supplement
179 0xB3 ³ ³ │ Number Punct Latin-1 Supplement
180 0xB4 ´ ´ ┤ Punct Latin-1 Supplement
181 0xB5 µ µ Á Alpha Lower Punct Latin-1 Supplement
182 0xB6 ¶ ¶ Â Punct Latin-1 Supplement
183 0xB7 · · À Punct Latin-1 Supplement
184 0xB8 ¸ ¸ © Punct Latin-1 Supplement
185 0xB9 ¹ ¹ ╣ Number Punct Latin-1 Supplement
186 0xBA º º ║ Alpha Lower Punct Latin-1 Supplement
187 0xBB » » ╗ Punct Latin-1 Supplement
188 0xBC ¼ ¼ ╝ Punct Latin-1 Supplement
189 0xBD ½ ½ ¢ Punct Latin-1 Supplement
190 0xBE ¾ ¾ ¥ Punct Latin-1 Supplement
191 0xBF ¿ ¿ ┐ Punct Latin-1 Supplement
192 0xC0 À À └ Alpha Upper Latin-1 Supplement
193 0xC1 Á Á ┴ Alpha Upper Latin-1 Supplement
194 0xC2 Â Â ┬ Alpha Upper Latin-1 Supplement
195 0xC3 Ã Ã ├ Alpha Upper Latin-1 Supplement
196 0xC4 Ä Ä ─ Alpha Upper Latin-1 Supplement
197 0xC5 Å Å ┼ Alpha Upper Latin-1 Supplement
198 0xC6 Æ Æ ã Alpha Upper Latin-1 Supplement
199 0xC7 Ç Ç Ã Alpha Upper Latin-1 Supplement
200 0xC8 È È ╚ Alpha Upper Latin-1 Supplement
201 0xC9 É É ╔ Alpha Upper Latin-1 Supplement
202 0xCA Ê Ê ╩ Alpha Upper Latin-1 Supplement
203 0xCB Ë Ë ╦ Alpha Upper Latin-1 Supplement
204 0xCC Ì Ì ╠ Alpha Upper Latin-1 Supplement
205 0xCD Í Í ═ Alpha Upper Latin-1 Supplement
206 0xCE Î Î ╬ Alpha Upper Latin-1 Supplement
207 0xCF Ï Ï ¤ Alpha Upper Latin-1 Supplement
208 0xD0 Ð Ð ð Alpha Upper Latin-1 Supplement
209 0xD1 Ñ Ñ Ð Alpha Upper Latin-1 Supplement
210 0xD2 Ò Ò Ê Alpha Upper Latin-1 Supplement
211 0xD3 Ó Ó Ë Alpha Upper Latin-1 Supplement
212 0xD4 Ô Ô È Alpha Upper Latin-1 Supplement
213 0xD5 Õ Õ ı Alpha Upper Latin-1 Supplement
214 0xD6 Ö Ö Í Alpha Upper Latin-1 Supplement
215 0xD7 × × Î Punct Latin-1 Supplement
216 0xD8 Ø Ø Ï Alpha Upper Latin-1 Supplement
217 0xD9 Ù Ù ┘ Alpha Upper Latin-1 Supplement
218 0xDA Ú Ú ┌ Alpha Upper Latin-1 Supplement
219 0xDB Û Û █ Alpha Upper Latin-1 Supplement
220 0xDC Ü Ü ▄ Alpha Upper Latin-1 Supplement
221 0xDD Ý Ý ¦ Alpha Upper Latin-1 Supplement
222 0xDE Þ Þ Ì Alpha Upper Latin-1 Supplement
223 0xDF ß ß ▀ Alpha Lower Latin-1 Supplement
224 0xE0 à à Ó Alpha Lower Latin-1 Supplement
225 0xE1 á á ß Alpha Lower Latin-1 Supplement
226 0xE2 â â Ô Alpha Lower Latin-1 Supplement
227 0xE3 ã ã Ò Alpha Lower Latin-1 Supplement
228 0xE4 ä ä õ Alpha Lower Latin-1 Supplement
229 0xE5 å å Õ Alpha Lower Latin-1 Supplement
230 0xE6 æ æ µ Alpha Lower Latin-1 Supplement
231 0xE7 ç ç þ Alpha Lower Latin-1 Supplement
232 0xE8 è è Þ Alpha Lower Latin-1 Supplement
233 0xE9 é é Ú Alpha Lower Latin-1 Supplement
234 0xEA ê ê Û Alpha Lower Latin-1 Supplement
235 0xEB ë ë Ù Alpha Lower Latin-1 Supplement
236 0xEC ì ì ý Alpha Lower Latin-1 Supplement
237 0xED í í Ý Alpha Lower Latin-1 Supplement
238 0xEE î î ¯ Alpha Lower Latin-1 Supplement
239 0xEF ï ï ´ Alpha Lower Latin-1 Supplement
240 0xF0 ð ð Alpha Lower Latin-1 Supplement
241 0xF1 ñ ñ ± Alpha Lower Latin-1 Supplement
242 0xF2 ò ò ‗ Alpha Lower Latin-1 Supplement
243 0xF3 ó ó ¾ Alpha Lower Latin-1 Supplement
244 0xF4 ô ô ¶ Alpha Lower Latin-1 Supplement
245 0xF5 õ õ § Alpha Lower Latin-1 Supplement
246 0xF6 ö ö ÷ Alpha Lower Latin-1 Supplement
247 0xF7 ÷ ÷ ¸ Punct Latin-1 Supplement
248 0xF8 ø ø ° Alpha Lower Latin-1 Supplement
249 0xF9 ù ù ¨ Alpha Lower Latin-1 Supplement
250 0xFA ú ú · Alpha Lower Latin-1 Supplement
251 0xFB û û ¹ Alpha Lower Latin-1 Supplement
252 0xFC ü ü ³ Alpha Lower Latin-1 Supplement
253 0xFD ý ý ² Alpha Lower Latin-1 Supplement
254 0xFE þ þ ■ Alpha Lower Latin-1 Supplement
255 0xFF ÿ ÿ Alpha Lower Latin-1 Supplement
来源:https://stackoverflow.com/questions/59110563/different-behaviour-and-output-when-piping-through-cmd-and-powershell