Now I am learning how to use spark.I have a piece of code which can invert a matrix and it works when the order of the matrix is small like 100.But when the order of the mat
This is because Spark create some temp shuffle files under /tmp directory of you local system.You can avoid this issue by setting below properties in your spark conf files.
Set this property in spark-env.sh.
SPARK_JAVA_OPTS+=" -Dspark.local.dir=/mnt/spark,/mnt2/spark -Dhadoop.tmp.dir=/mnt/ephemeral-hdfs"
export SPARK_JAVA_OPTS
According to the Error message
you have provided, your situation is no disk space left on your hard-drive. However, it's not caused by RDD persistency, but shuffle which you implicitly required when calling reduce
.
Therefore, you should clear your drive and make more spaces for your tmp folder
As a complementary, to specify default folder for you shuffle tmp files, you can add below line to $SPARK_HOME/conf/spark-defaults.conf
:
spark.local.dir /mnt/nvme/local-dir,/mnt/nvme/local-dir2