join in a dataframe spark java

前端 未结 2 1956
长发绾君心
长发绾君心 2021-02-09 11:57

First of all, thank you for the time in reading my question.

My question is the following: In Spark with Java, i load in two dataframe the data of two csv files.

相关标签:
2条回答
  • 2021-02-09 12:41

    You can use join method with column name to join two dataframes, e.g.:

    Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
    Dataset <Row> dfairport_city_state = Load.Csv (sqlContext,   data_airport_city_state);
    
    Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"));
    

    There is also an overloaded version that allows you to specify the join type as third argument, e.g.:

    Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport_city_state("City"), "left_outer");

    Here's more on joins.

    0 讨论(0)
  • 2021-02-09 12:42

    First, thank you very much for your response.

    I have tried both of my solutions but none of them work, I get the following error: The method dfairport_city_state (String) is undefined for the type ETL_Airport

    I can not access a specific column of the dataframe for join.

    EDIT: Already got to do the join, I put here the solution in case someone else helps;)

    Thanks for everything and best regards

    //Join de tablas en las que comparten ciudad
    Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));
    
    0 讨论(0)
提交回复
热议问题