Pyspark: filter dataframe by regex with string formatting?

后端 未结 3 1670
长发绾君心
长发绾君心 2021-02-01 06:35

I\'ve read several posts on using the \"like\" operator to filter a spark dataframe by the condition of containing a string/expression, but was wondering if the following is a \

3条回答
  •  北恋
    北恋 (楼主)
    2021-02-01 07:01

    From neeraj's hint, it seems like the correct way to do this in pyspark is:

    expr = "Arizona.*hot"
    dk = dx.filter(dx["keyword"].rlike(expr))
    

    Note that dx.filter($"keyword" ...) did not work since (my version) of pyspark didn't seem to support the $ nomenclature out of the box.

提交回复
热议问题