Python unicode escape for RethinkDB match (regex) query

坚强是说给别人听的谎言 提交于 2019-12-13 18:16:44

问题


I am trying to perform a rethinkdb match query with an escaped unicode user provided search param:

import re
from rethinkdb import RethinkDB

r = RethinkDB()

search_value = u"\u05e5"  # provided by user via flask
search_value_escaped = re.escape(search_value)  # results in u'\\\u05e5' ->
    # when encoded with "utf-8" gives "\ץ" as expected.

conn = rethinkdb.connect(...)

results_cursor_a = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value)
).run(conn)  # search_value works fine

results_cursor_b = r.db(...).table(...).order_by(index="id").filter(
    lambda doc: doc.coerce_to("string").match(search_value_escaped)
).run(conn)  # search_value_escaped spits an error

The error for search_value_escaped is the following:

ReqlQueryLogicError: Error in regexp `\ץ` (portion `\ץ`): invalid escape sequence: \ץ in:
r.db(...).table(...).order_by(index="id").filter(lambda var_1: var_1.coerce_to('string').match(u'\\\u05e5m'))
                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

I tried encoding with "utf-8" before/after re.escape() but same results with different errors. What am I messing? Is it something in my code or some kind of a bug?

EDIT: .coerce_to('string') converts the document to "utf-8" encoded string. RethinkDB also converts the query to "utf-8" and then it matches them hence the first query works even though it looks like a unicde match inside a string.


回答1:


From what it looks like RethinkDB rejects escaped unicode characters so I wrote a simple workaround with a custom escape without implementing my own logic of replacing characters (in fear that I must miss one and create a security issue).

import re

def no_unicode_escape(u):
    escaped_list = []

    for i in u:
        if ord(i) < 128:
            escaped_list.append(re.escape(i))
        else:
            escaped_list.append(i)

    rv = "".join(escaped_list)
    return rv

or a one-liner:

import re

def no_unicode_escape(u):
    return "".join(re.escape(i) if ord(i) < 128 else i for i in u)

Which yields the required result of escaping "dangerous" characters and works with RethinkDB as I wanted.



来源:https://stackoverflow.com/questions/55433603/python-unicode-escape-for-rethinkdb-match-regex-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!