python-unicode

How to work with UTF-16 in python ctypes?

喜欢而已 提交于 2021-02-07 10:07:23
问题 I have a foreign C library which uses utf-16 in API: as function arguments, return values and structure members. On Windows its OK with ctypes.c_wchar_p, but under OSX ctypes uses UCS-32 in c_wchar and I could not find the way to support utf-16. Here is my research: Use _SimpleCData subclassing to redefine _check_retval_. it allows a transparent conversion of utf-16 to Python string. can be placed as C structure member But it doesn't allow to handle strings as arguments, its from_param()

Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character

六眼飞鱼酱① 提交于 2021-02-07 08:12:31
问题 I've been trying to write a script that would potentially scrape the list of usernames off the comments section on a defined YouTube video and paste those usernames onto a .csv file. Here's the script : from selenium import webdriver import time import csv from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup as soup driver=webdriver.Chrome() driver.get('https://www.youtube.com/watch?v=VIDEOURL') time.sleep(5) driver.execute_script("window.scrollTo(0, 500)") time.sleep

Transform Pandas string column containing unicodes to ascii to load urls

孤街浪徒 提交于 2021-02-05 12:12:31
问题 I have a pandas DataFrame containing a column with Wikipedia urls, that I want to load. However, some strings won't load because they contain unicodes. For example, 'Kruskal %E2%80%93 Wallis_one-way_analysis_of_variance' raises the following PageError: Page id "Cauchy%E2%80%93Schwarz_inequality" does not match any pages. Try another id! Is there a way to turn all unicodes into ascii? So in this case, I need a function that can create a new column: old column new column Cauchy%E2%80%93Schwarz

Convert in utf16

痞子三分冷 提交于 2021-01-29 06:06:18
问题 I am crawling several websites and extract the names of the products. In some names there are errors like this: Malecon 12 Jahre 0,05 ltr.<br>Reserva Superior Bols Watermelon Lik\u00f6r 0,7l Hayman\u00b4s Sloe Gin Ron Zacapa Edici\u00f3n Negra Havana Club A\u00f1ejo Especial Caol Ila 13 Jahre (G&M Discovery) How can I fix that? I am using xpath and re.search to get the names. In every Python file, this is the first code: # -*- coding: utf-8 -*- Edit: This is the sourcecode, how I get the

How to fix “latin-1 codec can't encode characters in position” in requests

拥有回忆 提交于 2020-12-10 07:44:20
问题 I am having trouble with encoding in python 3. When I was testing on my PC I get no errors: Python 3.7.3 (default, Jun 24 2019, 04:54:02) [GCC 9.1.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> print(requests.get('https://www.kinopoisk.ru').text) everything good. But when I ran this code on my VPS a have following error: Python 3.7.3 (default, Apr 3 2019, 19:16:38) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux

How to fix “latin-1 codec can't encode characters in position” in requests

梦想与她 提交于 2020-12-10 07:43:05
问题 I am having trouble with encoding in python 3. When I was testing on my PC I get no errors: Python 3.7.3 (default, Jun 24 2019, 04:54:02) [GCC 9.1.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import requests >>> print(requests.get('https://www.kinopoisk.ru').text) everything good. But when I ran this code on my VPS a have following error: Python 3.7.3 (default, Apr 3 2019, 19:16:38) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux

How can I handle these weird special characters messing my print formatting?

梦想与她 提交于 2020-12-09 16:32:52
问题 I am printing a formatted table. But sometimes these user generated characters are taking more than one character width and it messes up the formatting as you can see in the screenshot below... The width of the "title" column is formatted to be 68 bytes. But these "special characters" are taking up more than 1 character width but are only counted as 1 character. This pushes the column past its bounds. print('{0:16s}{3:<18s}{1:68s}{2:>8n}'.format(( ' ' + streamer['user_name'][:12] + '..') if

How can I handle these weird special characters messing my print formatting?

≡放荡痞女 提交于 2020-12-09 16:28:42
问题 I am printing a formatted table. But sometimes these user generated characters are taking more than one character width and it messes up the formatting as you can see in the screenshot below... The width of the "title" column is formatted to be 68 bytes. But these "special characters" are taking up more than 1 character width but are only counted as 1 character. This pushes the column past its bounds. print('{0:16s}{3:<18s}{1:68s}{2:>8n}'.format(( ' ' + streamer['user_name'][:12] + '..') if

How can I handle these weird special characters messing my print formatting?

烂漫一生 提交于 2020-12-09 16:26:23
问题 I am printing a formatted table. But sometimes these user generated characters are taking more than one character width and it messes up the formatting as you can see in the screenshot below... The width of the "title" column is formatted to be 68 bytes. But these "special characters" are taking up more than 1 character width but are only counted as 1 character. This pushes the column past its bounds. print('{0:16s}{3:<18s}{1:68s}{2:>8n}'.format(( ' ' + streamer['user_name'][:12] + '..') if