问题
How can I find extended ASCII characters in a file using Perl? Can anyone get the script?
.....thanks in advance.....
回答1:
Since the extended ASCII characters have value 128 and higher, you can just call ord on individual characters and handle those with a value >= 128. The following code reads from stdin and prints only the extended ASCII characters:
while (<>) {
while (/(.)/g) {
print($1) if (ord($1) >= 128);
}
}
Alternatively, unpack together with chr will also work. Example:
while (<>) {
foreach (unpack("C*", $_)) {
print(chr($_)) if ($_ >= 128);
}
}
(I'm sure some Perl guru can condense both of these to two one-liners...)
To print the line numbers instead, you can use the following (this does not remove duplicates, and will have odd behaviour when unicode is passed):
while (<>) {
while (/(.)/g) {
print($. . "\n") if (ord($1) >= 128);
}
}
(Thanks Yaakov Belch for the $.
tip.)
回答2:
The first printable ASCII character is space
(32). The last printable ASCII character is ~
(126). So I'd probably use
while (<>) {
print "$.\n" if /[^ -~]/;
}
although it will, admittedly, also display lines containing control characters as well as extended ASCII.
Edit: Changed to print the line number rather than the line itself.
回答3:
Oneliner:
perl -nE'say$.if/[\xE0-\xFF]/'
for older perl versions
perl -lne'print$.if/[\xE0-\xFF]/'
回答4:
A crucial question is whether the
use bytes;
pragma should be in effect. The poster should decide that. For picking characters with codes greater than 127, the following will suffice:
print grep 127 < ord, split // while <>;
or
print grep /[^[:ascii:]]/, split // while <>;
回答5:
Hynek -Pichi- Vychodil's answer:
perl -nE'say$.if/[\xE0-\xFF]/'
only tests a limited part of the non-printing should presumably be
perl -nE'say$.if/[\x80-\xFF]/'
instead.
回答6:
What about grep?
grep [\x00-\x1F\x7F-\xFF]+ *
来源:https://stackoverflow.com/questions/881931/how-can-i-find-extended-ascii-characters-in-a-file-using-perl