I am new to Spark. and I have input file with training data 4000x1800. When I try to train this data (written python) get following error:
14/11/15 22:39:13
I had a similar problem, I tried something like:
numPartitions = a number for example 10 or 100 data = sc.textFile("myfile.txt",numPartitions)
Inspired by: How to repartition evenly in Spark? or here: https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html