replace special characters in a string python

后端 未结 5 816
感动是毒
感动是毒 2020-12-04 20:06

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting a

相关标签:
5条回答
  • 2020-12-04 20:22

    You can replace the special characters with the desired characters as follows,

    import string
    specialCharacterText = "H#y #@w @re &*)?"
    inCharSet = "!@#$%^&*()[]{};:,./<>?\|`~-=_+\""
    outCharSet = "                               " #corresponding characters in inCharSet to be replaced
    splCharReplaceList = string.maketrans(inCharSet, outCharSet)
    splCharFreeString = specialCharacterText.translate(splCharReplaceList)
    
    0 讨论(0)
  • 2020-12-04 20:28

    replace operates on a specific string, so you need to call it like this

    removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
    

    but this is probably not what you need, since this will look for a single string containing all that characters in the same order. you can do it with a regexp, as Danny Michaud pointed out.

    as a side note, you might want to look for BeautifulSoup, which is a library for parsing messy HTML formatted text like what you usually get from scaping websites.

    0 讨论(0)
  • 2020-12-04 20:38

    One way is to use re.sub, that's my preferred way.

    import re
    my_str = "hey th~!ere"
    my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
    print my_new_string
    

    Output:

    hey there
    

    Another way is to use re.escape:

    import string
    import re
    
    my_str = "hey th~!ere"
    
    chars = re.escape(string.punctuation)
    print re.sub(r'['+chars+']', '',my_str)
    

    Output:

    hey there
    

    Just a small tip about parameters style in python by PEP-8 parameters should be remove_special_chars and not removeSpecialChars

    Also if you want to keep the spaces just change [^a-zA-Z0-9 \n\.] to [^a-zA-Z0-9\n\.]

    0 讨论(0)
  • 2020-12-04 20:40

    You need to call replace on z and not on str, since you want to replace characters located in the string variable z

    removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
    

    But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re with the sub function:

    import re
    removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)
    

    Don't forget the [], which indicates that this is a set of characters to be replaced.

    0 讨论(0)
  • 2020-12-04 20:43

    str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:

    removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?\|`~-=_+"})
    

    This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.

    0 讨论(0)
提交回复
热议问题