Creating a simple 1-row Spark DataFrame with Java API

前端 未结 2 2000
无人及你
无人及你 2021-02-04 14:54

In Scala, I can create a single-row DataFrame from an in-memory string like so:

val stringAsList = List(\"buzz\")
val df = sqlContext.sparkContext.parallelize(js         


        
相关标签:
2条回答
  • 2021-02-04 15:04

    You can achieve this by creating List to Rdd and than create Schema which will contain column name.

    There might be other ways as well, it's just one of them.

    List<String> stringAsList = new ArrayList<String>();
            stringAsList.add("buzz");
    
    JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String row) -> {
                    return RowFactory.create(row);
                });
    
    StructType schema = DataTypes.createStructType(new StructField[] { DataTypes.createStructField("fizz", DataTypes.StringType, false) });
    
    DataFrame df = sqlContext.createDataFrame(rowRDD, schema).toDF();
    df.show();
    
    //+----+
    |fizz|
    +----+
    |buzz|
    
    0 讨论(0)
  • 2021-02-04 15:14

    I have created 2 examples for Spark 2 if you need to upgrade:

    Simple Fizz/Buzz (or foe/bar - old generation :) ):

        SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
                .getOrCreate();
    
        List<String> stringAsList = new ArrayList<>();
        stringAsList.add("bar");
    
        JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
    
        JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String row) -> RowFactory.create(row));
    
        // Creates schema
        StructType schema = DataTypes.createStructType(
                new StructField[] { DataTypes.createStructField("foe", DataTypes.StringType, false) });
    
        Dataset<Row> df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
    

    2x2 data:

        SparkSession spark = SparkSession.builder().appName("Build a DataFrame from Scratch").master("local[*]")
                .getOrCreate();
    
        List<String[]> stringAsList = new ArrayList<>();
        stringAsList.add(new String[] { "bar1.1", "bar2.1" });
        stringAsList.add(new String[] { "bar1.2", "bar2.2" });
    
        JavaSparkContext sparkContext = new JavaSparkContext(spark.sparkContext());
    
        JavaRDD<Row> rowRDD = sparkContext.parallelize(stringAsList).map((String[] row) -> RowFactory.create(row));
    
        // Creates schema
        StructType schema = DataTypes
                .createStructType(new StructField[] { DataTypes.createStructField("foe1", DataTypes.StringType, false),
                        DataTypes.createStructField("foe2", DataTypes.StringType, false) });
    
        Dataset<Row> df = spark.sqlContext().createDataFrame(rowRDD, schema).toDF();
    

    Code can be downloaded from: https://github.com/jgperrin/net.jgp.labs.spark.

    0 讨论(0)
提交回复
热议问题