How to remove special characters except space from a file in python?

后端 未结 5 1524
误落风尘
误落风尘 2021-02-19 03:49

I have a huge corpus of text (line by line) and I want to remove special characters but sustain the space and structure of the string.

hello? there A-Z-R_T(,**)         


        
5条回答
  •  既然无缘
    2021-02-19 04:33

    A more elegant solution would be

    print(re.sub(r"\W+|_", " ", string))

    >>> hello there A Z R T world welcome to python this should the next line followed by another million like this

    Here, re is regex module in python

    re.sub will substitute pattern with space i.e., " "

    r'' will treat input string as raw (with \n)

    \W for all non-words i.e. all special characters *&^%$ etc excluding underscore _

    + will match zero to unlimited matches, similar to * (one to more)

    | is logical OR

    _ stands for underscore

提交回复
热议问题