Which one will perform better, broadcast variable or broadcast join?
问题 I am using Spark 2.4.1 with Java 8 in my project. I have a scenario where I need to look-up another table/dataset which has two fields i.e. country-name and country-code. Another stream-data will have country-code column in it, I need to map respective country-name in the target/result dataframe. As far as I know, we can use join to achieve the above, using broadcast variable and joining. So from performance point of view which one is better here? What is the spark standard to handle this