问题
I am trying to perform a rethinkdb match query with an escaped unicode user provided search param:
import re
from rethinkdb import RethinkDB
r = RethinkDB()
search_value = u"\u05e5" # provided by user via flask
search_value_escaped = re.escape(search_value) # results in u'\\\u05e5' ->
# when encoded with "utf-8" gives "\ץ" as expected.
conn = rethinkdb.connect(...)
results_cursor_a = r.db(...).table(...).order_by(index="id").filter(
lambda doc: doc.coerce_to("string").match(search_value)
).run(conn) # search_value works fine
results_cursor_b = r.db(...).table(...).order_by(index="id").filter(
lambda doc: doc.coerce_to("string").match(search_value_escaped)
).run(conn) # search_value_escaped spits an error
The error for search_value_escaped is the following:
ReqlQueryLogicError: Error in regexp `\ץ` (portion `\ץ`): invalid escape sequence: \ץ in:
r.db(...).table(...).order_by(index="id").filter(lambda var_1: var_1.coerce_to('string').match(u'\\\u05e5m'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I tried encoding with "utf-8" before/after re.escape() but same results with different errors. What am I messing? Is it something in my code or some kind of a bug?
EDIT: .coerce_to('string') converts the document to "utf-8" encoded string. RethinkDB also converts the query to "utf-8" and then it matches them hence the first query works even though it looks like a unicde match inside a string.
回答1:
From what it looks like RethinkDB rejects escaped unicode characters so I wrote a simple workaround with a custom escape without implementing my own logic of replacing characters (in fear that I must miss one and create a security issue).
import re
def no_unicode_escape(u):
escaped_list = []
for i in u:
if ord(i) < 128:
escaped_list.append(re.escape(i))
else:
escaped_list.append(i)
rv = "".join(escaped_list)
return rv
or a one-liner:
import re
def no_unicode_escape(u):
return "".join(re.escape(i) if ord(i) < 128 else i for i in u)
Which yields the required result of escaping "dangerous" characters and works with RethinkDB as I wanted.
来源:https://stackoverflow.com/questions/55433603/python-unicode-escape-for-rethinkdb-match-regex-query