I have a web scraper that takes forum questions, splits them into individual words and writes it to the text file. The words are stored in a list of tuples. Each tuple contains
you can open the file C:\Users\SOMEN\AppData\Local\Programs\Python\Python37-32\lib\encodings*cp1252.py* in my case but it should be the same.
decoding_table = (
'\x00' # 0x00 -> NULL
'\x01' # 0x01 -> START OF HEADING
'\x02' # 0x02 -> START OF TEXT
'\x03' # 0x03 -> END OF TEXT
'\x04' # 0x04 -> END OF TRANSMISSION
'\x05' # 0x05 -> ENQUIRY
'\x06' # 0x06 -> ACKNOWLEDGE
'\x07' # 0x07 -> BELL
'\x08' # 0x08 -> BACKSPACE
'\t' # 0x09 -> HORIZONTAL TABULATION
'\n' # 0x0A -> LINE FEED
'\x0b' # 0x0B -> VERTICAL TABULATION
'\x0c' # 0x0C -> FORM FEED
'\r' # 0x0D -> CARRIAGE RETURN
'\x0e' # 0x0E -> SHIFT OUT
'\x0f' # 0x0F -> SHIFT IN
'\x10' # 0x10 -> DATA LINK ESCAPE
'\x11' # 0x11 -> DE
#add the character code here
'\u200b' #add this in the file and save it.
I tested that with python 2.7. replace
works as expected:
>>> u'used\u200b'.replace(u'\u200b', '*')
u'used*'
and so does strip:
>>> u'used\u200b'.strip(u'\u200b')
u'used'
Just remember that the arguments to those functions have to be Unicode literals. It should be u'\u200b'
, not '\u200b'
. Notice the u
in the beginning.
And actually, writing that character to a file works just fine.
>>> import codecs
>>> f = codecs.open('a.txt', encoding='utf-8', mode='w')
>>> f.write(u'used\u200bZero')
See resources: