Container OOMs

Container OOMs can be difficult to debug as the container running the problematic code is killed, and sometimes not all of the log information is available.

Non-JVM language users (such as Python) are most likely to encounter issues with container OOMs. This is because the JVM is generally configured to not use more memory than the container it is running in.

Everything which isn't inside the JVM is considered "overhead", so Tensorflow, Python, bash, etc. A first step with a container OOM is often increasing spark.executor.memoryOverhead and spark.driver.memoryOverhead to leave more memory for non-Java processes.

Python users can set spark.executor.pyspark.memory to limit the Python VM to a certain amount of memory. This amount of memory is then added to the overhead.

Python users performing aggregations in Python should also check out the PyUDFOOM page.