I have the pyspark code below as part of an etl pipeline. I\'m currently running it in aws glue, but I\'m trying to rewrite it as pure pyspark. The data is first imported