We use broadcast hash join in Spark when we have one dataframe small enough to get fit into memory. When the size of small dataframe is below spark.sql.autoBroadcastJo
The idea here is to create broadcast variable before join to easily control it. Without it you can't control these variables - spark do it for you.
Example:
from pyspark.sql.functions import broadcast
sdf2_bd = broadcast(sdf2)
sdf1.join(sdf2_bd, sdf1.id == sdf2_bd.id)
To all broadcast variables(automatically created in joins or created by hands) this rules are applied: