How to convert List to JavaRDD

前端 未结 4 782
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-11 15:54

We know that in spark there is a method rdd.collect which converts RDD to a list.

List f= rdd.collect();
String[] array = f.toArray(new String[f.si         


        
相关标签:
4条回答
  • 2021-02-11 16:29

    You're looking for JavaSparkContext.parallelize(List) and similar. This is just like in the Scala API.

    0 讨论(0)
  • 2021-02-11 16:33
    List<StructField> fields = new ArrayList<>();
    fields.add(DataTypes.createStructField("fieldx1", DataTypes.StringType, true));
    fields.add(DataTypes.createStructField("fieldx2", DataTypes.StringType, true));
    fields.add(DataTypes.createStructField("fieldx3", DataTypes.LongType, true));
    
    
    List<Row> data = new ArrayList<>();
    data.add(RowFactory.create("","",""));
    Dataset<Row> rawDataSet = spark.createDataFrame(data, schema).toDF();
    
    0 讨论(0)
  • 2021-02-11 16:38

    There are two ways to convert a collection to a RDD.

    1) sc.Parallelize(collection)
    2) sc.makeRDD(collection)
    

    Both of the method are identical, so we can use any of them

    0 讨论(0)
  • 2021-02-11 16:48

    Adding to Sean Owen and others solutions

    You can use JavaSparkContext#parallelizePairs for List ofTuple

    List<Tuple2<Integer, Integer>> pairs = new ArrayList<>();
    pairs.add(new Tuple2<>(0, 5));
    pairs.add(new Tuple2<>(1, 3));
    
    JavaSparkContext sc = new JavaSparkContext();
    
    JavaPairRDD<Integer, Integer> rdd = sc.parallelizePairs(pairs);
    
    0 讨论(0)
提交回复
热议问题