How do I remove all non-ASCII characters with regex and Notepad++?

爱⌒轻易说出口 提交于 2020-01-08 17:22:11

问题


I searched a lot, but nowhere is it written how to remove non-ASCII characters from Notepad++.

I need to know what command to write in find and replace (with picture it would be great).

  • If I want to make a white-list and bookmark all the ASCII words/lines so non-ASCII lines would be unmarked

  • If the file is quite large and can't select all the ASCII lines and just want to select the lines containing non-ASCII characters...


回答1:


This expression will search for non-ASCII values:

[^\x00-\x7F]+

Tick off 'Search Mode = Regular expression', and click Find Next.

Source: Regex any ASCII character




回答2:


In Notepad++, if you go to menu SearchFind characters in rangeNon-ASCII Characters (128-255) you can then step through the document to each non-ASCII character.

Be sure to tick off "Wrap around" if you want to loop in the document for all non-ASCII characters.




回答3:


In addition to the answer by ProGM, in case you see characters in boxes like NUL or ACK and want to get rid of them, those are ASCII control characters (0 to 31), you can find them with the following expression and remove them:

[\x00-\x1F]+

In order to remove all non-ASCII AND ASCII control characters, you should remove all characters matching this regex:

[^\x1F-\x7F]+



回答4:


To remove all non-ASCII characters, you can use following replacement: [^\x00-\x7F]+

To highlight characters, I recommend using the Mark function in the search window: this highlights non-ASCII characters and put a bookmark in the lines containing one of them

If you want to highlight and put a bookmark on the ASCII characters instead, you can use the regex [\x00-\x7F] to do so.

Cheers




回答5:


To keep new lines:

  1. First select a character for new line... I used #.
  2. Select replace option, extended.
  3. input \n replace with #
  4. Hit Replace All

Next:

  1. Select Replace option Regular Expression.
  2. Input this : [^\x20-\x7E]+
  3. Keep Replace With Empty
  4. Hit Replace All

Now, Select Replace option Extended and Replace # with \n

:) now, you have a clean ASCII file ;)




回答6:


Another good trick is to go into UTF8 mode in your editor so that you can actually see these funny characters and delete them yourself.




回答7:


Another way...

  1. Install the Text FX plugin if you don't have it already
  2. Go to the TextFX menu option -> zap all non printable characters to #. It will replace all invalid chars with 3 # symbols
  3. Go to Find/Replace and look for ###. Replace it with a space.

This is nice if you can't remember the regex or don't care to look it up. But the regex mentioned by others is a nice solution as well.



来源:https://stackoverflow.com/questions/20889996/how-do-i-remove-all-non-ascii-characters-with-regex-and-notepad

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!