Apache Spark: pyspark crash for large dataset

后端 未结 5 1188
清歌不尽
清歌不尽 2021-01-01 20:48

I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:

  1. 14/11/15 22:39:13

5条回答
  •  有刺的猬
    2021-01-01 21:07

    I had a similar problem, I tried something like:

    numPartitions = a number for example 10 or 100 data = sc.textFile("myfile.txt",numPartitions)

    Inspired by: How to repartition evenly in Spark? or here: https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html

提交回复
热议问题