发表新帖

发表新帖

Apache Spark: pyspark crash for large dataset

后端未结

关注

 5  1190

清歌不尽 2021-01-01 20:48

I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:

14/11/15 22:39:13

5条回答

有刺的猬 (楼主)

2021-01-01 21:07

I had a similar problem, I tried something like:

numPartitions = a number for example 10 or 100 data = sc.textFile("myfile.txt",numPartitions)

Inspired by: How to repartition evenly in Spark? or here: https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题