python-unicode

Python: Traceback codecs.charmap_decode(input,self.errors,decoding_table)[0]

隐身守侯 提交于 2019-12-23 12:16:03
问题 Following is sample code, aim is just to merges text files from give folder and it's sub folder. i am getting Traceback occasionally so not sure where to look. also need some help to enhance the code to prevent blank line being merge & to display no lines in merged/master file. Probably it's good idea to before merging file, some cleanup should performed or just to ignores blank line during merging process. Text file in folder is not more then 1000 lines but aggregate master file could cross

Python: Why am I getting a UnicodeDecodeError?

纵饮孤独 提交于 2019-12-23 03:15:05
问题 I have the following code that search through files using RE's and if any matches are found it move the file into a different directory. import os import gzip import re import shutil def regEx1(): os.chdir("C:/Users/David/myfiles") files = os.listdir(".") os.mkdir("C:/Users/David/NewFiles") regex_txt = input("Please enter the string your are looking for:") for x in (files): inputFile = open((x), "r") content = inputFile.read() inputFile.close() regex = re.compile(regex_txt, re.IGNORECASE) if

decoding and encoding Hebrew string in Python

ぃ、小莉子 提交于 2019-12-22 01:26:23
问题 I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish: >>> word = "שלום" >>> word = word.decode('UTF-8') >>> word u'\u05e9\u05dc\u05d5\u05dd' >>> print word שלום >>> word = word.encode('UTF-8') >>> word '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d' >>> print word ׳©׳׳•׳ How should I do it propely? Thanks. 回答1: You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following: #!/usr/bin

Unicode in python

放肆的年华 提交于 2019-12-22 01:07:40
问题 Now I use elixir with my mysql database and the redispy with redis and i select UTF-8 at all the place. I wanna to get some data writing in chinese like {'Info':‘8折’,'Name':'家乐福'} but what i got is like this: {'Info': u'8\u6298', 'Name': u'\u5bb6\u4e50\u798f'} and after i store this dict to redis and get it out by redispy it becomes: {"Info": "8\u6298", "Name": "\u5bb6\u4e50\u798f"} I know if i add u' before 8\u6298 and print it it will shou me "8折" but is there a function or another solution

Will a UNICODE string just containing ASCII characters always be equal to the ASCII string?

让人想犯罪 __ 提交于 2019-12-21 06:59:56
问题 I noticed the following holds: >>> u'abc' == 'abc' True >>> 'abc' == u'abc' True Will this always be true or could it possibly depend on the system locale? (It seems strings are unicode in python 3: e.g. this question, but bytes in 2.x) 回答1: Python 2 coerces between unicode and str using the ASCII codec when comparing the two types. So yes, this is always true. That is to say, unless you mess up your Python installation and use sys.setdefaultencoding() to change that default. You cannot do

Reading russian language data from csv

依然范特西╮ 提交于 2019-12-21 06:06:20
问题 I have some data in CSV file that are in Russian: 2-комнатная квартира РДТ', мкр Тастак-3, Аносова — Толе би;Алматы 2-комнатная квартира БГР', мкр Таугуль, Дулати (Навои) — Токтабаева;Алматы 2-комнатная квартира ЦФМ', мкр Тастак-2, Тлендиева — Райымбека;Алматы Delimiter is ; symbol. I want to read data and put it into array. I tried to read this data using this code: def loadCsv(filename): lines = csv.reader(open(filename, "rb"),delimiter=";" ) dataset = list(lines) for i in range(len(dataset

Unicode Encode Error when writing pandas df to csv

£可爱£侵袭症+ 提交于 2019-12-20 10:33:35
问题 I cleaned 400 excel files and read them into python using pandas and appended all the raw data into one big df. Then when I try to export it to a csv: df.to_csv("path",header=True,index=False) I get this error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 20: ordinal not in range(128) Can someone suggest a way to fix this and what it means? Thanks 回答1: You have unicode values in your DataFrame. Files store bytes, which means all unicode have to be encoded into

How can I adapt my code to make it compatible to Microsoft Excel?

∥☆過路亽.° 提交于 2019-12-20 06:28:52
问题 Problem I was trying to implement an web API(based on Flask ), which would be used to query the database given some specific conditions, reconstruct the data and finally export the result to a .csv file. Since the amount of data is really really huge, I can not construct the whole dataset and generate the .csv file all at once(e.g. create a DataFrame using pandas and finally call df.to_csv()), because that would cause a slow query and maybe the http connection would end up timeout. So I

UnicodeDecodeError when using Python 2.x unicodecsv

廉价感情. 提交于 2019-12-19 16:52:27
问题 I'm trying to write out a csv file with Unicode characters, so I'm using the unicodecsv package. Unfortunately, I'm still getting UnicodeDecodeErrors: # -*- coding: utf-8 -*- import codecs import unicodecsv raw_contents = 'He observes an “Oversized Gorilla” near Ashford' encoded_contents = unicode(raw_contents, errors='replace') with codecs.open('test.csv', 'w', 'UTF-8') as f: w = unicodecsv.writer(f, encoding='UTF-8') w.writerow(["1", encoded_contents]) This is the traceback: Traceback (most

Python3: UnicodeEncodeError: 'ascii' codec can't encode character '\xfc'

青春壹個敷衍的年華 提交于 2019-12-19 14:03:14
问题 I'am trying to get running a very simple example on OSX with python 3.5.1 but I'm really stucked. Have read so many articles that deal with similar problems but I can not fix this by myself. Do you have any hints how to resolve this issue? I would like to have the correct encoded latin-1 output as defined in mylist without any errors. My code: # coding=<latin-1> mylist = [u'Glück', u'Spaß', u'Ähre',] print(mylist) The error: Traceback (most recent call last): File "/Users/abc/test.py", line 4