We know that in spark there is a method rdd.collect which converts RDD to a list.
List f= rdd.collect();
String[] array = f.toArray(new String[f.si
You're looking for JavaSparkContext.parallelize(List) and similar. This is just like in the Scala API.
List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("fieldx1", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("fieldx2", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("fieldx3", DataTypes.LongType, true));
List<Row> data = new ArrayList<>();
data.add(RowFactory.create("","",""));
Dataset<Row> rawDataSet = spark.createDataFrame(data, schema).toDF();
There are two ways to convert a collection to a RDD.
1) sc.Parallelize(collection)
2) sc.makeRDD(collection)
Both of the method are identical, so we can use any of them
Adding to Sean Owen and others solutions
You can use JavaSparkContext#parallelizePairs
for List
ofTuple
List<Tuple2<Integer, Integer>> pairs = new ArrayList<>();
pairs.add(new Tuple2<>(0, 5));
pairs.add(new Tuple2<>(1, 3));
JavaSparkContext sc = new JavaSparkContext();
JavaPairRDD<Integer, Integer> rdd = sc.parallelizePairs(pairs);