I have a data frame with four fields. one of the field name is Status and i am trying to use a OR condition in .filter for a dataframe . I tried below queries but no luck.
Another way is to use function expr with where clause
import org.apache.spark.sql.functions.expr
df2 = df1.where(expr("col1 = 'value1' and col2 = 'value2'"))
It works the same.
df2 = df1.filter("Status=2")
.filter("Status=3");
You can try, (filtering with 1 object like a list or a set of values)
ds = ds.filter(functions.col(COL_NAME).isin(myList));
or as @Tony Fraser suggested, you can try, (with a Seq of objects)
ds = ds.filter(functions.col(COL_NAME).isin(mySeq));
All the answers are correct but most of them do not represent a good coding style. Also, you should always consider the variable length of arguments for the future, even though they are static at a certain point in time.
If we want partial match just like contains, we can chain the contain call like this :
def getSelectedTablesRows2(allTablesInfoDF: DataFrame, tableNames: Seq[String]): DataFrame = {
val tableFilters = tableNames.map(_.toLowerCase()).map(name => lower(col("table_name")).contains(name))
val finalFilter = tableFilters.fold(lit(false))((accu, newTableFilter) => accu or newTableFilter)
allTablesInfoDF.where(finalFilter)
}
Instead of:
df2 = df1.filter("Status=2" || "Status =3")
Try:
df2 = df1.filter($"Status" === 2 || $"Status" === 3)