unicode-escapes | 易学教程

Python unicode escape for RethinkDB match (regex) query

阅读更多关于 Python unicode escape for RethinkDB match (regex) query

问题 I am trying to perform a rethinkdb match query with an escaped unicode user provided search param: import re from rethinkdb import RethinkDB r = RethinkDB() search_value = u"\u05e5" # provided by user via flask search_value_escaped = re.escape(search_value) # results in u'\\\u05e5' -> # when encoded with "utf-8" gives "\ץ" as expected. conn = rethinkdb.connect(...) results_cursor_a = r.db(...).table(...).order_by(index="id").filter( lambda doc: doc.coerce_to("string").match(search_value) )

Error reading csv file unicodeescape

阅读更多关于 Error reading csv file unicodeescape

问题 I have this program import csv with open("C:\Users\frederic\Desktop\WinPython-64bit-3.4.4.3Qt5\notebooks\scores.txt","r") as scoreFile: # write = w, read = r, append = a scoreFileReader = csv.reader(scoreFile) scoreList = [] for row in scoreFileReader: if len (row) != 0: scoreList = scoreList + [row] scoreFile.close() print(scoreList) Why do I get this Error ? with open("C:\Users\frederic\Desktop\WinPython-64bit-3.4.4.3Qt5\notebooks\scores.txt","r") as scoreFile: ^ SyntaxError: (unicode error

Unicode escape sequence in command line MySQL

阅读更多关于 Unicode escape sequence in command line MySQL

问题 Short version: What kind of escape sequence can one use to search for unicode characters in command line mysql? Long version: I'm looking for a way to search a column for records containing a unicode sequence, U+200B, in mysql from the command line. I can't figure out which kind of escape to use. I've tried \u200B and x200B and even I finally found one blog that suggested the _utf8 syntax. This will produce the character on the command line: select _utf8 x'200B'; Now I'm stuck trying to get

sed: matching unicode blocks with

阅读更多关于 sed: matching unicode blocks with

问题 I am desperately trying to replace certain unicode characters (graphemes) from a file using sed. However I keep failing for some of them, namely the ones from unicode blocks: \p{InHigh_Surrogates}: U+D800–U+DB7F \p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF \p{InLow_Surrogates}: U+DC00–U+DFFF I tried (in a sed config file loaded via the -f switch): s/\p{InHigh_Surrogates}/###/ --> no effect at all s/\\p\{InHigh_Surrogates\}/###_D-NON-UTF8_###/ -> error message 'Invalid content of \{\}'

Does python re (regex) have an alternative to \u unicode escape sequences?

阅读更多关于 Does python re (regex) have an alternative to \u unicode escape sequences?

问题 Python treats \uxxxx as a unicode character escape inside a string literal (e.g. u"\u2014" gets interpreted as Unicode character U+2014). But I just discovered (Python 2.7) that standard regex module doesn't treat \uxxxx as a unicode character. Example: codepoint = 2014 # Say I got this dynamically from somewhere test = u"This string ends with \u2014" pattern = r"\u%s$" % codepoint assert(pattern[-5:] == "2014$") # Ends with an escape sequence for U+2014 assert(re.search(pattern, test) !=

How to decode a string containing backslash-encoded Unicode characters?

阅读更多关于 How to decode a string containing backslash-encoded Unicode characters?

问题 I have a string stored as a : a := `M\u00fcnchen` fmt.Println(a) // prints "M\u00fcnchen" b := "M\u00fcnchen" fmt.Println(b) // prints "München" Is there a way I can convert a into b ? 回答1: You can use strconv.Unquote for this: u := `M\u00fcnchen` s, err := strconv.Unquote(`"` + u + `"`) if err != nil { // .. } fmt.Printf("%v\n", s) Outputs: München 来源： https://stackoverflow.com/questions/35519106/how-to-decode-a-string-containing-backslash-encoded-unicode-characters

Why do I need to escape unicode in java source files?

阅读更多关于 Why do I need to escape unicode in java source files?

问题 Please note that I'm not asking how but why. And I don't know if it's a RCP specific problem or if it's something inherent to java. My java source files are encoded in UTF-8. If I define my literal strings like this : new Language("fr", "Français"), new Language("zh", "中文") It works as I expect when I use the string in the application by launching it from Eclipse as an Eclipse application : But if fails when I launch the .exe built by the "Eclipse Product Export Wizard" : The solution I use

random text from /dev/random raising an error in lxml: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

阅读更多关于 random text from /dev/random raising an error in lxml: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

I am, for the sake of testing my web app, pasting some random characters from /dev/random into my web frontend. This line throws an error: print repr(comment) import html5lib print html5lib.parse(comment, treebuilder="lxml") 'a\xef\xbf\xbd\xef\xbf\xbd\xc9\xb6E\xef\xbf\xbd\xef\xbf\xbd`\xef\xbf\xbd]\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd2 \x14\xef\xbf\xbd\xc7\xbe\xef\xbf\xbdy\xcb\x9c\xef\xbf\xbdi1O\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbdZ\xef\xbf\xbd.\xef\xbf\xbd\x17^C' Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.6/dist-packages/twisted/internet

Json escape unicode in SQL Server

阅读更多关于 Json escape unicode in SQL Server

I got JSon string with escape unicode symbols \u041e\u043f\u043e\u0440\u0430 \u0448\u0430\u0440\u043e\u0432\u0430\u044f VW GOLF I know that 4 digits after \u are the hex code of unicode character. So here's how I decoded those strings ALTER FUNCTION dbo.Json_Unicode_Decode(@escapedString VARCHAR(MAX)) RETURNS VARCHAR(MAX) AS BEGIN DECLARE @pos INT = 0, @char CHAR, @escapeLen TINYINT = 2, @hexDigits TINYINT = 4 SET @pos = CHARINDEX('\u', @escapedString, @pos) WHILE @pos > 0 BEGIN SET @char = NCHAR(CONVERT(varbinary(8), '0x' + SUBSTRING(@escapedString, @pos + @escapeLen, @hexDigits), 1)) SET

PHP: How to match a range of unicode paired surrogates emoticons/emoji?

阅读更多关于 PHP: How to match a range of unicode paired surrogates emoticons/emoji?

问题 anubhava's answer about matching ranges of unicode characters led me to the regex to use for cleaning up a specific range of single code point of characters. With it, now I can match all miscellaneous symbols in this list (includes emoticons) with this simple expression: preg_replace('/[\x{2600}-\x{26FF}]/u', '', $str); However, I also want to match those in this list of paired/double surrogates emoji, but as nhahtdh explained in a comment: There is a range from d800 to dfff to specify