I have a PySpark dataframe that looks like:
df2 = spark.createDataFrame([(\'101\', \'1\', [\'a\',\'aa\'], [\'aa\', \'bb\']), (\'1