Dataframe from List in Java

后端 未结 3 748
盖世英雄少女心
盖世英雄少女心 2021-01-15 17:23
  • Spark Version : 1.6.2
  • Java Version: 7

I have a List data. Something like:

[[dev, engg, 10000], [kar         


        
3条回答
  •  醉梦人生
    2021-01-15 17:44

    DataFrame createNGramDataFrame(JavaRDD lines) {
     JavaRDD rows = lines.map(new Function(){
        private static final long serialVersionUID = -4332903997027358601L;
    
        @Override
        public Row call(String line) throws Exception {
            return RowFactory.create(line.split("\\s+"));
        }
     });
     StructType schema = new StructType(new StructField[] {
            new StructField("words",
                    DataTypes.createArrayType(DataTypes.StringType), false,
                    Metadata.empty()) });
     DataFrame wordDF = new SQLContext(jsc).createDataFrame(rows, schema);
     // build a bigram language model
     NGram transformer = new NGram().setInputCol("words")
            .setOutputCol("ngrams").setN(2);
     DataFrame ngramDF = transformer.transform(wordDF);
     ngramDF.show(10, false);
     return ngramDF;
    }
    

提交回复
热议问题