USING LIKE inside pandas.query()

后端 未结 6 567
庸人自扰
庸人自扰 2020-12-13 18:22

I have been using Pandas for more than 3 months and I have an fair idea about the dataframes accessing and querying etc.

I have got an requirement wherein I wanted t

相关标签:
6条回答
  • 2020-12-13 18:39

    @volodymyr is right, but the thing he forgets is that you need to set engine='python' to expression to work.

    Example: >>> pd_df.query('column_name.str.contains("abc")', engine='python')

    Here is more information on default engine ('numexpr') and 'python' engine. Also, have in mind that 'python' is slower on big data.

    0 讨论(0)
  • 2020-12-13 18:39

    A trick I just came up with for "starts with":

    pandas.query('"abc" <= column_name <= "abc~"')
    

    Explanation: pandas accepts "greater" and "less than" statements for strings in a query, so anything starting with "abc" will be greater or equal to "abc" in the lexicographic order. The tilde (~) is the largest character in the ASCII table, so anything starting with "abc" will be less than or equal to "abc~".

    A few things to take into consideration:

    • This is of course case sensitive. All lower case characters come after all upper cases characters in the ASCII table.
    • This won't work fully for Unicode strings, but the general principle should be the same.
    • I couldn't come up with parallel tricks for "contains" or "ends with".
    0 讨论(0)
  • 2020-12-13 18:39

    I know this is a pretty old post but I'm just going to leave this here for those who are looking for answers.

    df.query('column_name == "value"')
    

    This worked for me when I needed to query the dataframe for matching string.

    0 讨论(0)
  • 2020-12-13 18:42

    If you have to use df.query(), the correct syntax is:

    df.query('column_name.str.contains("abc")', engine='python')
    

    You can easily combine this with other conditions:

    df.query('column_a.str.contains("abc") or column_b.str.contains("xyz") and column_c>100', engine='python')
    

    It is not a full equivalent of SQL Like, however, but can be useful nevertheless.

    0 讨论(0)
  • 2020-12-13 18:42

    Not using query(), but this will give you what you're looking for:

    df[df.col_name.str.startswith('abc')]
    
    
    df
    Out[93]: 
      col_name
    0     this
    1     that
    2     abcd
    
    df[df.col_name.str.startswith('abc')]
    Out[94]: 
      col_name
    2     abcd
    

    Query uses the pandas eval() and is limited in what you can use within it. If you want to use pure SQL you could consider pandasql where the following statement would work for you:

    sqldf("select col_name from df where col_name like 'abc%';", locals())
    

    Or alternately if your problem with the pandas str methods was that your column wasn't entirely of string type you could do the following:

    df[df.col_name.str.startswith('abc').fillna(False)]
    
    0 讨论(0)
  • 2020-12-13 18:56

    Super late to this post, but for anyone that comes across it. You can use boolean indexing by making your search criteria based on a string method check str.contains.

    Example:

    dataframe[dataframe.summary.str.contains('Windows Failed Login', case=False)]
    

    In the code above, the snippet inside the brackets refers to the summary column of the dataframe and uses the .str.contains method to search for 'Windows Failed Login' within every value of that Series. Case sensitive can be set to true or false. This will return boolean index which is then used to return the dataframe your looking for. You can use .fillna() with this in the brackets as well if you run into any Nan errors.

    Hope this helps!

    0 讨论(0)
提交回复
热议问题