问题
I would like to know how this exactly works,
df = sqlContext.read \
.format("org.apache.phoenix.spark") \
.option("table", "TABLE") \
.option("zkUrl", "10.0.0.11:2181:/hbase-unsecure") \
.load()
if this is loading the whole table or it will delay the loading to know if a filtering will be applied.
In the first case, how is the way to tell phoenix to filter the table before loading in the spark dataframe?
Thanks
回答1:
Data is not loaded until you execute an action which requires it. All filter applied in the middle:
df.where($"foo" === "bar").count
will be pushed down by Spark if it is possible. You can watch results of predicate pushdown by running explain()
来源:https://stackoverflow.com/questions/40870475/filtering-from-phoenix-when-loading-a-table