I am trying to join two large spark dataframes and keep running into this error:
Container killed by YARN for exceeding memory limits. 24 GB of 22 GB physical
Overhead options are nicely explained in the configuration document:
This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
This also includes user objects if you use one of the non-JVM guest languages (Python, R, etc...).