I want to know more precisely about the use of the method cache for dataframe in pyspark
When I run df.cache()
it returns a dataframe.
Therefore, if I do <
I found the source code DataFrame.cache
def cache(self):
"""Persists the :class:`DataFrame` with the default storage level (`MEMORY_AND_DISK`).
.. note:: The default storage level has changed to `MEMORY_AND_DISK` to match Scala in 2.0.
"""
self.is_cached = True
self._jdf.cache()
return self
Therefore, the answer is : both