I have constructed two dataframes. How can we join multiple Spark dataframes ?
For Example :
PersonDf
, ProfileDf
with a common col
inner join with scala
val joinedDataFrame = PersonDf.join(ProfileDf ,"personId")
joinedDataFrame.show
Posting a java based solution, incase your team only uses java. The keyword inner
will ensure that matching rows only are present in the final dataframe.
Dataset<Row> joined = PersonDf.join(ProfileDf,
PersonDf.col("personId").equalTo(ProfileDf.col("personId")),
"inner");
joined.show();