First of all, thank you for the time in reading my question.
My question is the following: In Spark with Java, i load in two dataframe the data of two csv files.
You can use join
method with column name to join two dataframes, e.g.:
Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
Dataset <Row> dfairport_city_state = Load.Csv (sqlContext, data_airport_city_state);
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"));
There is also an overloaded version that allows you to specify the join
type as third argument, e.g.:
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"), "left_outer");
Here's more on joins.
First, thank you very much for your response.
I have tried both of my solutions but none of them work, I get the following error: The method dfairport_city_state (String) is undefined for the type ETL_Airport
I can not access a specific column of the dataframe for join.
EDIT: Already got to do the join, I put here the solution in case someone else helps;)
Thanks for everything and best regards
//Join de tablas en las que comparten ciudad
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));