How to transform the dataframe into label feature vector?

前端 未结 1 1772
小鲜肉
小鲜肉 2021-01-22 00:56

I am running a logistic regression modl in scala and I have a data frame like below:

df

+-----------+------------+
|x          |y           |
+----------         


        
相关标签:
1条回答
  • 2021-01-22 01:11

    Given the dataframe as

    +---+---+
    |x  |y  |
    +---+---+
    |0  |0  |
    |0  |33 |
    |0  |58 |
    |0  |96 |
    |0  |1  |
    |1  |21 |
    |0  |10 |
    |0  |65 |
    |1  |7  |
    |1  |28 |
    +---+---+
    

    And doing as below

    val assembler =  new VectorAssembler()
      .setInputCols(Array("x", "y"))
      .setOutputCol("features")
    
      val output = assembler.transform(df).select($"x".cast(DoubleType).as("label"), $"features")
    output.show(false)
    

    Would give you result as

    +-----+----------+
    |label|features  |
    +-----+----------+
    |0.0  |(2,[],[]) |
    |0.0  |[0.0,33.0]|
    |0.0  |[0.0,58.0]|
    |0.0  |[0.0,96.0]|
    |0.0  |[0.0,1.0] |
    |1.0  |[1.0,21.0]|
    |0.0  |[0.0,10.0]|
    |0.0  |[0.0,65.0]|
    |1.0  |[1.0,7.0] |
    |1.0  |[1.0,28.0]|
    +-----+----------+
    

    Now using LogisticRegression would be easy

    val lr = new LogisticRegression()
      .setMaxIter(10)
      .setRegParam(0.3)
      .setElasticNetParam(0.8)
    
    val lrModel = lr.fit(output)
    println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
    

    You will have output as

    Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717
    
    0 讨论(0)
提交回复
热议问题