Python: Getting rid of \u200b from a string using regular expressions

后端 未结 2 1331
清酒与你
清酒与你 2021-02-08 15:17

I have a web scraper that takes forum questions, splits them into individual words and writes it to the text file. The words are stored in a list of tuples. Each tuple contains

2条回答
  •  北海茫月
    2021-02-08 15:23

    you can open the file C:\Users\SOMEN\AppData\Local\Programs\Python\Python37-32\lib\encodings*cp1252.py* in my case but it should be the same.

    decoding_table = (
    '\x00'     #  0x00 -> NULL
    '\x01'     #  0x01 -> START OF HEADING
    '\x02'     #  0x02 -> START OF TEXT
    '\x03'     #  0x03 -> END OF TEXT
    '\x04'     #  0x04 -> END OF TRANSMISSION
    '\x05'     #  0x05 -> ENQUIRY
    '\x06'     #  0x06 -> ACKNOWLEDGE
    '\x07'     #  0x07 -> BELL
    '\x08'     #  0x08 -> BACKSPACE
    '\t'       #  0x09 -> HORIZONTAL TABULATION
    '\n'       #  0x0A -> LINE FEED
    '\x0b'     #  0x0B -> VERTICAL TABULATION
    '\x0c'     #  0x0C -> FORM FEED
    '\r'       #  0x0D -> CARRIAGE RETURN
    '\x0e'     #  0x0E -> SHIFT OUT
    '\x0f'     #  0x0F -> SHIFT IN
    '\x10'     #  0x10 -> DATA LINK ESCAPE
    '\x11'     #  0x11 -> DE
    #add the character code here
    '\u200b' #add this in the file and save it.
    

提交回复
热议问题