Executor out of disk error
By far the most common cause of executor out of disk errors is a mis-configuration of Spark's temporary directories.
You should set spark.local.dir
to a directory with lots of local storage available. If you are on YARN this will be overriden by LOCAL_DIRS
environment variable on the workers.
Kubernetes users may wish to add a large emptyDir
for Spark to use for temporary storage.
Another common cause is having no longer needed/used RDDs/DataFrames/Datasets in scope. This tends to happen more often with notebooks as more things are placed in the global scope where they are not automatically cleaned up. A solution to this is breaking your code into more functions so that things go out of scope, or explicitily setting no longer needed RDDs/DataFrames/Datasets to None/null.
On the other hand if you have an iterative algorithm you should investigate if you may have to big of a DAG.