spark 卡方分布的假设检验

匿名 (未验证) 提交于 2019-12-03 00:22:01
from pyspark.ml.linalg import Vectors from pyspark.ml.stat import ChiSquareTest   from pyspark.sql import SparkSession  spark= SparkSession\                 .builder \                 .appName("dataFrame") \                 .getOrCreate()  data = [(0.0, Vectors.dense(0.5, 10.0)),         (0.0, Vectors.dense(1.5, 20.0)),         (1.0, Vectors.dense(1.5, 30.0)),         (0.0, Vectors.dense(3.5, 30.0)),         (0.0, Vectors.dense(3.5, 40.0)),         (1.0, Vectors.dense(3.5, 40.0))] df = spark.createDataFrame(data, ["label", "features"])  r = ChiSquareTest.test(df, "features", "label").head() print("pValues: " + str(r.pValues)) print("degreesOfFreedom: " + str(r.degreesOfFreedom)) print("statistics: " + str(r.statistics))
pValues: [0.6872892787909721,0.6822703303362126] degreesOfFreedom: [2, 3] statistics: [0.75,1.5]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!